How do deep-learning models generalize across populations? Cross-ethnicity generalization of COPD detection.

D Almeida, Silvia; Norajitra, Tobias; Lüth, Carsten T; Wald, Tassilo; Weru, Vivienn; Nolden, Marco; Jäger, Paul F; von Stackelberg, Oyunbileg; Heußel, Claus Peter; Weinheimer, Oliver; Biederer, Jürgen; Kauczor, Hans-Ulrich; Maier-Hein, Klaus

D Almeida, Silvia; Norajitra, Tobias; Lüth, Carsten T; Wald, Tassilo; Weru, Vivienn; Nolden, Marco; Jäger, Paul F; von Stackelberg, Oyunbileg; Heußel, Claus Peter; Weinheimer, Oliver; Biederer, Jürgen; Kauczor, Hans-Ulrich; Maier-Hein, Klaus.

Affiliation

D Almeida S; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany. silvia.diasalmeida@dkfz-heidelberg.de.
Norajitra T; Translational Lung Research Center Heidelberg (TLRC), Member of the German Center for Lung Research (DZL), Heidelberg, Germany. silvia.diasalmeida@dkfz-heidelberg.de.
Lüth CT; Medical Faculty, Heidelberg University, Heidelberg, Germany. silvia.diasalmeida@dkfz-heidelberg.de.
Wald T; National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and Heidelberg University Medical Center, Heidelberg, Germany. silvia.diasalmeida@dkfz-heidelberg.de.
Weru V; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Nolden M; Translational Lung Research Center Heidelberg (TLRC), Member of the German Center for Lung Research (DZL), Heidelberg, Germany.
Jäger PF; Interactive Machine Learning Group (IML), German Cancer Research Center (DKFZ), Heidelberg, Germany.
von Stackelberg O; Helmholtz Imaging, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Heußel CP; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Weinheimer O; Helmholtz Imaging, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Biederer J; Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Kauczor HU; Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Maier-Hein K; Pattern Analysis and Learning Group, Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany.

Insights Imaging ; 15(1): 198, 2024 Aug 07.

Article in En | MEDLINE | ID: mdl-39112910

ABSTRACT

ABSTRACT

OBJECTIVES:

To evaluate the performance and potential biases of deep-learning models in detecting chronic obstructive pulmonary disease (COPD) on chest CT scans across different ethnic groups, specifically non-Hispanic White (NHW) and African American (AA) populations. MATERIALS AND

METHODS:

Inspiratory chest CT and clinical data from 7549 Genetic epidemiology of COPD individuals (mean age 62 years old, 56-69 interquartile range), including 5240 NHW and 2309 AA individuals, were retrospectively analyzed. Several factors influencing COPD binary classification performance on different ethnic populations were examined (1) effects of training population NHW-only, AA-only, balanced set (half NHW, half AA) and the entire set (NHW + AA all); (2) learning strategy three supervised learning (SL) vs. three self-supervised learning (SSL) methods. Distribution shifts across ethnicity were further assessed for the top-performing methods.

RESULTS:

The learning strategy significantly influenced model performance, with SSL methods achieving higher performances compared to SL methods (p < 0.001), across all training configurations. Training on balanced datasets containing NHW and AA individuals resulted in improved model performance compared to population-specific datasets. Distribution shifts were found between ethnicities for the same health status, particularly when models were trained on nearest-neighbor contrastive SSL. Training on a balanced dataset resulted in fewer distribution shifts across ethnicity and health status, highlighting its efficacy in reducing biases.

CONCLUSION:

Our findings demonstrate that utilizing SSL methods and training on large and balanced datasets can enhance COPD detection model performance and reduce biases across diverse ethnic populations. These findings emphasize the importance of equitable AI-driven healthcare solutions for COPD diagnosis. CRITICAL RELEVANCE STATEMENT Self-supervised learning coupled with balanced datasets significantly improves COPD detection model performance, addressing biases across diverse ethnic populations and emphasizing the crucial role of equitable AI-driven healthcare solutions. KEY POINTS Self-supervised learning methods outperform supervised learning methods, showing higher AUC values (p < 0.001). Balanced datasets with non-Hispanic White and African American individuals improve model performance. Training on diverse datasets enhances COPD detection accuracy. Ethnically diverse datasets reduce bias in COPD detection models. SimCLR models mitigate biases in COPD detection across ethnicities.

Key words

Artificial intelligence; Chronic obstructive pulmonary disease; Computed tomography; Deep learning; Ethnicity

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Insights Imaging Year: 2024 Document type: Article Affiliation country: Country of publication:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google