Your browser doesn't support javascript.
loading
A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records.
Weberpals, Janick; Raman, Sudha R; Shaw, Pamela A; Lee, Hana; Russo, Massimiliano; Hammill, Bradley G; Toh, Sengwee; Connolly, John G; Dandreo, Kimberly J; Tian, Fang; Liu, Wei; Li, Jie; Hernández-Muñoz, José J; Glynn, Robert J; Desai, Rishi J.
Afiliación
  • Weberpals J; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Raman SR; Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA.
  • Shaw PA; Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA.
  • Lee H; Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
  • Russo M; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Hammill BG; Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA.
  • Toh S; Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA.
  • Connolly JG; Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA.
  • Dandreo KJ; Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA.
  • Tian F; Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
  • Liu W; Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
  • Li J; Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
  • Hernández-Muñoz JJ; Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
  • Glynn RJ; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Desai RJ; Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Clin Epidemiol ; 16: 329-343, 2024.
Article en En | MEDLINE | ID: mdl-38798915
ABSTRACT

Objective:

Partially observed confounder data pose challenges to the statistical analysis of electronic health records (EHR) and systematic assessments of potentially underlying missingness mechanisms are lacking. We aimed to provide a principled approach to empirically characterize missing data processes and investigate performance of analytic methods.

Methods:

Three empirical sub-cohorts of diabetic SGLT2 or DPP4-inhibitor initiators with complete information on HbA1c, BMI and smoking as confounders of interest (COI) formed the basis of data simulation under a plasmode framework. A true null treatment effect, including the COI in the outcome generation model, and four missingness mechanisms for the COI were simulated completely at random (MCAR), at random (MAR), and two not at random (MNAR) mechanisms, where missingness was dependent on an unmeasured confounder and on the value of the COI itself. We evaluated the ability of three groups of diagnostics to differentiate between mechanisms 1)-differences in characteristics between patients with or without the observed COI (using averaged standardized mean differences [ASMD]), 2)-predictive ability of the missingness indicator based on observed covariates, and 3)-association of the missingness indicator with the outcome. We then compared analytic methods including "complete case", inverse probability weighting, single and multiple imputation in their ability to recover true treatment effects.

Results:

The diagnostics successfully identified characteristic patterns of simulated missingness mechanisms. For MAR, but not MCAR, the patient characteristics showed substantial differences (median ASMD 0.20 vs 0.05) and consequently, discrimination of the prediction models for missingness was also higher (0.59 vs 0.50). For MNAR, but not MAR or MCAR, missingness was significantly associated with the outcome even in models adjusting for other observed covariates. Comparing analytic methods, multiple imputation using a random forest algorithm resulted in the lowest root-mean-squared-error.

Conclusion:

Principled diagnostics provided reliable insights into missingness mechanisms. When assumptions allow, multiple imputation with nonparametric models could help reduce bias.
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: Clin Epidemiol Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: Clin Epidemiol Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos