Developing non-response weights to account for attrition-related bias in a longitudinal pregnancy cohort

Tona Pitt*, Kamala Adhikari, Shainur Premji, Sheila McDonald

*Corresponding author for this work

Research output: Contribution to conferenceAbstractpeer-review

Abstract

Objective
The prospective cohort study design is ideal for examining diseases of public health importance. A main source of potential bias for longitudinal studies is attrition. In this study, we compare the performance of two models developed to predict sources of attrition and develop weights to adjust for potential bias.

Approach
This study used the All Our Families longitudinal pregnancy cohort of 3351 maternal-infant pairs. Logistic regression models were developed to predict study continuation versus drop-out from baseline to the three-year data collection wave.

Two methods of variable selection took place. One method used previous knowledge and content expertise while the second used Least Absolute Shrinkage and Selection Operator (LASSO). Model performance for both methods were compared using area under the receiver operator curve values (AUROC) and calibration plots. Stabilized inverse probability weights were generated using predicted probabilities. Weight performance was assessed using standardized differences with and without weights (unadjusted estimates).

Results
LASSO and investigator prediction models had good and fair discrimination with AUROC of 0.73 (95% Confidence Interval [CI]: 0.71 – 0.75) and 0.69 ( 95% CI: 0.67 – 0.71), respectively. Calibration plots and non significant Hosmer-Lemeshow Goodness of Fit Tests indicated that both the LASSO model (p = 0.10) and investigator model (p = 0.50) were well-calibrated.

Unweighted results indicated large (>10%) standardized differences in 15 demographic data variables (range: 11% - 29%), when comparing those who continued in study with those that did not. Weights derived from the LASSO and investigator models reduced standardized differences relative to unadjusted estimates, with ranges of 0.1% - 5.3% and 0.3% - 12.7%, respectively.

Conclusion
The data-driven approach produced robust weights that addressed non-response bias more than the knowledge-driven approach. The data driven approach, did, however still require content knowledge in how data were grouped, combined, or split. The weights can be applied to analyses across multiple waves of data collection to reduce bias.
Original languageEnglish
Publication statusPublished - 25 Aug 2022
EventInternational Population Data Linkage Conference - Edinburgh, United Kingdom
Duration: 8 Sept 202210 Sept 2022

Conference

ConferenceInternational Population Data Linkage Conference
Country/TerritoryUnited Kingdom
CityEdinburgh
Period8/09/2210/09/22

Cite this