Pesquisa | Portal Regional da BVS

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Schaekermann, Mike; Spitz, Terry; Pyles, Malcolm; Cole-Lewis, Heather; Wulczyn, Ellery; Pfohl, Stephen R; Martin, Donald; Jaroensri, Ronnachai; Keeling, Geoff; Liu, Yuan; Farquhar, Stephanie; Xue, Qinghan; Lester, Jenna; Hughes, Cían; Strachan, Patricia; Tan, Fraser; Bui, Peggy; Mermel, Craig H; Peng, Lily H; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Virmani, Sunny; Semturs, Christopher; Liu, Yun; Horn, Ivor; Cameron Chen, Po-Hsuan.

EClinicalMedicine ; 70: 102479, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38685924

RESUMO

Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding: Google LLC.

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians.

Dvijotham, Krishnamurthy Dj; Winkens, Jim; Barsbey, Melih; Ghaisas, Sumedh; Stanforth, Robert; Pawlowski, Nick; Strachan, Patricia; Ahmed, Zahra; Azizi, Shekoofeh; Bachrach, Yoram; Culp, Laura; Daswani, Mayank; Freyberg, Jan; Kelly, Christopher; Kiraly, Atilla; Kohlberger, Timo; McKinney, Scott; Mustafa, Basil; Natarajan, Vivek; Geras, Krzysztof; Witowski, Jan; Qin, Zhi Zhen; Creswell, Jacob; Shetty, Shravya; Sieniek, Marcin; Spitz, Terry; Corrado, Greg; Kohli, Pushmeet; Cemgil, Taylan; Karthikesalingam, Alan.

Nat Med ; 29(7): 1814-1820, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37460754

RESUMO

Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.

Assuntos

Inteligência Artificial , Triagem , Reprodutibilidade dos Testes , Fluxo de Trabalho , Humanos

A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment.

Gomes, Ryan G; Vwalika, Bellington; Lee, Chace; Willis, Angelica; Sieniek, Marcin; Price, Joan T; Chen, Christina; Kasaro, Margaret P; Taylor, James A; Stringer, Elizabeth M; McKinney, Scott Mayer; Sindano, Ntazana; Dahl, George E; Goodnight, William; Gilmer, Justin; Chi, Benjamin H; Lau, Charles; Spitz, Terry; Saensuksopa, T; Liu, Kris; Tiyasirichokchai, Tiya; Wong, Jonny; Pilgrim, Rory; Uddin, Akib; Corrado, Greg; Peng, Lily; Chou, Katherine; Tse, Daniel; Stringer, Jeffrey S A; Shetty, Shravya.

Commun Med (Lond) ; 2: 128, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36249461

RESUMO

Background: Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings. Methods: Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones. Results: Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference -1.4 ± 4.5 days, 95% CI -1.8, -0.9, n = 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00, n = 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep. Conclusions: The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.

Quantitative Analysis of OCT for Neovascular Age-Related Macular Degeneration Using Deep Learning.

Moraes, Gabriella; Fu, Dun Jack; Wilson, Marc; Khalid, Hagar; Wagner, Siegfried K; Korot, Edward; Ferraz, Daniel; Faes, Livia; Kelly, Christopher J; Spitz, Terry; Patel, Praveen J; Balaskas, Konstantinos; Keenan, Tiarnan D L; Keane, Pearse A; Chopra, Reena.

Ophthalmology ; 128(5): 693-705, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-32980396

RESUMO

PURPOSE: To apply a deep learning algorithm for automated, objective, and comprehensive quantification of OCT scans to a large real-world dataset of eyes with neovascular age-related macular degeneration (AMD) and make the raw segmentation output data openly available for further research. DESIGN: Retrospective analysis of OCT images from the Moorfields Eye Hospital AMD Database. PARTICIPANTS: A total of 2473 first-treated eyes and 493 second-treated eyes that commenced therapy for neovascular AMD between June 2012 and June 2017. METHODS: A deep learning algorithm was used to segment all baseline OCT scans. Volumes were calculated for segmented features such as neurosensory retina (NSR), drusen, intraretinal fluid (IRF), subretinal fluid (SRF), subretinal hyperreflective material (SHRM), retinal pigment epithelium (RPE), hyperreflective foci (HRF), fibrovascular pigment epithelium detachment (fvPED), and serous PED (sPED). Analyses included comparisons between first- and second-treated eyes by visual acuity (VA) and race/ethnicity and correlations between volumes. MAIN OUTCOME MEASURES: Volumes of segmented features (mm3) and central subfield thickness (CST) (µm). RESULTS: In first-treated eyes, the majority had both IRF and SRF (54.7%). First-treated eyes had greater volumes for all segmented tissues, with the exception of drusen, which was greater in second-treated eyes. In first-treated eyes, older age was associated with lower volumes for RPE, SRF, NSR, and sPED; in second-treated eyes, older age was associated with lower volumes of NSR, RPE, sPED, fvPED, and SRF. Eyes from Black individuals had higher SRF, RPE, and serous PED volumes compared with other ethnic groups. Greater volumes of the majority of features were associated with worse VA. CONCLUSIONS: We report the results of large-scale automated quantification of a novel range of baseline features in neovascular AMD. Major differences between first- and second-treated eyes, with increasing age, and between ethnicities are highlighted. In the coming years, enhanced, automated OCT segmentation may assist personalization of real-world care and the detection of novel structure-function correlations. These data will be made publicly available for replication and future investigation by the AMD research community.

Assuntos

Neovascularização de Coroide/diagnóstico por imagem , Degeneração Macular Exsudativa/diagnóstico por imagem , Idoso , Idoso de 80 Anos ou mais , Neovascularização de Coroide/fisiopatologia , Aprendizado Profundo , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Retina/diagnóstico por imagem , Descolamento Retiniano/diagnóstico , Epitélio Pigmentado da Retina/diagnóstico por imagem , Estudos Retrospectivos , Líquido Sub-Retiniano/diagnóstico por imagem , Tomografia de Coerência Óptica , Acuidade Visual/fisiologia , Degeneração Macular Exsudativa/fisiopatologia

Predicting conversion to wet age-related macular degeneration using deep learning.

Yim, Jason; Chopra, Reena; Spitz, Terry; Winkens, Jim; Obika, Annette; Kelly, Christopher; Askham, Harry; Lukic, Marko; Huemer, Josef; Fasler, Katrin; Moraes, Gabriella; Meyer, Clemens; Wilson, Marc; Dixon, Jonathan; Hughes, Cian; Rees, Geraint; Khaw, Peng T; Karthikesalingam, Alan; King, Dominic; Hassabis, Demis; Suleyman, Mustafa; Back, Trevor; Ledsam, Joseph R; Keane, Pearse A; De Fauw, Jeffrey.

Nat Med ; 26(6): 892-899, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32424211

RESUMO

Progression to exudative 'wet' age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on three-dimensional (3D) optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% of individual eyes, and false positives in 56% and 17% of individual eyes at the high sensitivity and high specificity points, respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes before conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression.

Assuntos

Aprendizado Profundo , Atrofia Geográfica/diagnóstico por imagem , Tomografia de Coerência Óptica , Degeneração Macular Exsudativa/diagnóstico , Idoso , Idoso de 80 Anos ou mais , Progressão da Doença , Diagnóstico Precoce , Intervenção Médica Precoce , Feminino , Humanos , Imageamento Tridimensional , Degeneração Macular/diagnóstico por imagem , Masculino , Prognóstico , Degeneração Macular Exsudativa/diagnóstico por imagem , Degeneração Macular Exsudativa/terapia

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA