Age, sex and race bias in automated arrhythmia detectors.

Perez Alday, Erick A; Rad, Ali B; Reyna, Matthew A; Sadr, Nadi; Gu, Annie; Li, Qiao; Dumitru, Mircea; Xue, Joel; Albert, Dave; Sameni, Reza; Clifford, Gari D

Perez Alday, Erick A; Rad, Ali B; Reyna, Matthew A; Sadr, Nadi; Gu, Annie; Li, Qiao; Dumitru, Mircea; Xue, Joel; Albert, Dave; Sameni, Reza; Clifford, Gari D.

Afiliación

Perez Alday EA; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America. Electronic address: erick@dbmi.emory.edu.
Rad AB; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Reyna MA; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Sadr N; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Gu A; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Li Q; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Dumitru M; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Xue J; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America; AliveCor Inc., United States of America.
Albert D; AliveCor Inc., United States of America.
Sameni R; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America.
Clifford GD; Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America; Department of Biomedical Engineering, Georgia Institute of Technology, United States of America.

J Electrocardiol ; 74: 5-9, 2022.

Article en En | MEDLINE | ID: mdl-35878534

ABSTRACT

ABSTRACT

Despite the recent explosion of machine learning applied to medical data, very few studies have examined algorithmic bias in any meaningful manner, comparing across algorithms, databases, and assessment metrics. In this study, we compared the biases in sex, age, and race of 56 algorithms on over 130,000 electrocardiograms (ECGs) using several metrics and propose a machine learning model design to reduce bias. Participants of the 2021 PhysioNet Challenge designed and implemented working, open-source algorithms to identify clinical diagnosis from 2- lead ECG recordings. We grouped the data from the training, validation, and test datasets by sex (male vs female), age (binned by decade), and race (Asian, Black, White, and Other) whenever possible. We computed recording-wise accuracy, area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), F-measure, and the Challenge Score for each of the 56 algorithms. The Mann-Whitney U and the Kruskal-Wallis tests assessed the performance differences of algorithms across these demographic groups. Group trends revealed similar values for the AUROC, AUPRC, and F-measure for both male and female groups across the training, validation, and test sets. However, recording-wise accuracies were 20% higher (p < 0.01) and the Challenge Score 12% lower (p = 0.02) for female subjects on the test set. AUPRC, F-measure, and the Challenge Score increased with age, while recording-wise accuracy and AUROC decreased with age. The results were similar for the training and test sets, but only recording-wise accuracy (12% decrease per decade, p < 0.01), Challenge Score (1% increase per decade, p < 0.01), and AUROC (1% decrease per decade, p < 0.01) were statistically different on the test set. We observed similar AUROC, AUPRC, Challenge Score, and F-measure values across the different race categories. But, recording-wise accuracies were significantly lower for Black subjects and higher for Asian subjects on the training (31% difference, p < 0.01) and test (39% difference, p < 0.01) sets. A top performing model was then retrained using an additional constraint which simultaneously minimized differences in performance across sex, race and age. This resulted in a modest reduction in performance, with a significant reduction in bias. This work provides a demonstration that biases manifest as a function of model architecture, population, cost function and optimization metric, all of which should be closely examined in any model.

Asunto(s)

Arritmias Cardíacas; Electrocardiografía; Femenino; Humanos; Masculino; Factores Sexuales; Factores de Edad

Palabras clave

Age; Bias; Healthcare; Machine learning; Race; Sex

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Arritmias Cardíacas / Electrocardiografía Tipo de estudio: Prognostic_studies Límite: Female / Humans / Male Idioma: En Revista: J Electrocardiol Año: 2022 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google