The limits of fair medical imaging AI in real-world generalization.

Yang, Yuzhe; Zhang, Haoran; Gichoya, Judy W; Katabi, Dina; Ghassemi, Marzyeh

Yang, Yuzhe; Zhang, Haoran; Gichoya, Judy W; Katabi, Dina; Ghassemi, Marzyeh.

Afiliação

Yang Y; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA. yuzhe@mit.edu.
Zhang H; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
Gichoya JW; Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA.
Katabi D; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
Ghassemi M; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.

Nat Med ; 30(10): 2838-2848, 2024 Oct.

Article em En | MEDLINE | ID: mdl-38942996

ABSTRACT

ABSTRACT

As artificial intelligence (AI) rapidly approaches human-level performance in medical imaging, it is crucial that it does not exacerbate or propagate healthcare disparities. Previous research established AI's capacity to infer demographic data from chest X-rays, leading to a key concern do models using demographic shortcuts have unfair predictions across subpopulations? In this study, we conducted a thorough investigation into the extent to which medical AI uses demographic encodings, focusing on potential fairness discrepancies within both in-distribution training sets and external test sets. Our analysis covers three key medical imaging disciplines-radiology, dermatology and ophthalmology-and incorporates data from six global chest X-ray datasets. We confirm that medical imaging AI leverages demographic shortcuts in disease classification. Although correcting shortcuts algorithmically effectively addresses fairness gaps to create 'locally optimal' models within the original data distribution, this optimality is not true in new test settings. Surprisingly, we found that models with less encoding of demographic attributes are often most 'globally optimal', exhibiting better fairness during model evaluation in new test environments. Our work establishes best practices for medical imaging models that maintain their performance and fairness in deployments beyond their initial training contexts, underscoring critical considerations for AI clinical deployments across populations and sites.

Assuntos

Inteligência Artificial; Diagnóstico por Imagem; Humanos; Diagnóstico por Imagem/métodos; Algoritmos; Masculino; Radiografia Torácica; Feminino; Dermatologia

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Diagnóstico por Imagem Limite: Female / Humans / Male Idioma: En Revista: Nat Med Assunto da revista: BIOLOGIA MOLECULAR / MEDICINA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google