Your browser doesn't support javascript.
loading
An evaluation of synthetic data augmentation for mitigating covariate bias in health data.
Juwara, Lamin; El-Hussuna, Alaa; El Emam, Khaled.
Affiliation
  • Juwara L; School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
  • El-Hussuna A; Research Institute, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada.
  • El Emam K; Open Source Research Collaboration, Aalborg, Denmark.
Patterns (N Y) ; 5(4): 100946, 2024 Apr 12.
Article in En | MEDLINE | ID: mdl-38645766
ABSTRACT
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Patterns (N Y) Year: 2024 Document type: Article Affiliation country: Canadá

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Patterns (N Y) Year: 2024 Document type: Article Affiliation country: Canadá