Búsqueda | BVS Bolivia

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.

Schaekermann, Mike; Spitz, Terry; Pyles, Malcolm; Cole-Lewis, Heather; Wulczyn, Ellery; Pfohl, Stephen R; Martin, Donald; Jaroensri, Ronnachai; Keeling, Geoff; Liu, Yuan; Farquhar, Stephanie; Xue, Qinghan; Lester, Jenna; Hughes, Cían; Strachan, Patricia; Tan, Fraser; Bui, Peggy; Mermel, Craig H; Peng, Lily H; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Virmani, Sunny; Semturs, Christopher; Liu, Yun; Horn, Ivor; Cameron Chen, Po-Hsuan.

EClinicalMedicine ; 70: 102479, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38685924

RESUMEN

Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding: Google LLC.

An End-to-End Platform for Digital Pathology Using Hyperspectral Autofluorescence Microscopy and Deep Learning-Based Virtual Histology.

McNeil, Carson; Wong, Pok Fai; Sridhar, Niranjan; Wang, Yang; Santori, Charles; Wu, Cheng-Hsun; Homyk, Andrew; Gutierrez, Michael; Behrooz, Ali; Tiniakos, Dina; Burt, Alastair D; Pai, Rish K; Tekiela, Kamilla; Patel, Hardik; Cameron Chen, Po-Hsuan; Fischer, Laurent; Martins, Eduardo Bruno; Seyedkazemi, Star; Freedman, Daniel; Kim, Charles C; Cimermancic, Peter.

Mod Pathol ; 37(2): 100377, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-37926422

RESUMEN

Conventional histopathology involves expensive and labor-intensive processes that often consume tissue samples, rendering them unavailable for other analyses. We present a novel end-to-end workflow for pathology powered by hyperspectral microscopy and deep learning. First, we developed a custom hyperspectral microscope to nondestructively image the autofluorescence of unstained tissue sections. We then trained a deep learning model to use autofluorescence to generate virtual histologic stains, which avoids the cost and variability of chemical staining procedures and conserves tissue samples. We showed that the virtual images reproduce the histologic features present in the real-stained images using a randomized nonalcoholic steatohepatitis (NASH) scoring comparison study, where both real and virtual stains are scored by pathologists (D.T., A.D.B., R.K.P.). The test showed moderate-to-good concordance between pathologists' scoring on corresponding real and virtual stains. Finally, we developed deep learning-based models for automated NASH Clinical Research Network score prediction. We showed that the end-to-end automated pathology platform is comparable with an independent panel of pathologists for NASH Clinical Research Network scoring when evaluated against the expert pathologist consensus scores. This study provides proof of concept for this virtual staining strategy, which could improve cost, efficiency, and reliability in pathology and enable novel approaches to spatial biology research.

Asunto(s)

Aprendizaje Profundo , Enfermedad del Hígado Graso no Alcohólico , Humanos , Microscopía , Reproducibilidad de los Resultados , Patólogos

Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation.

Majkowska, Anna; Mittal, Sid; Steiner, David F; Reicher, Joshua J; McKinney, Scott Mayer; Duggan, Gavin E; Eswaran, Krish; Cameron Chen, Po-Hsuan; Liu, Yun; Kalidindi, Sreenivasa Raju; Ding, Alexander; Corrado, Greg S; Tse, Daniel; Shetty, Shravya.

Radiology ; 294(2): 421-431, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-31793848

RESUMEN

BackgroundDeep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies.PurposeTo develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards.Materials and MethodsDeep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language processing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to account for positive radiograph enrichment and estimate population-level performance.ResultsIn DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively.ConclusionExpert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX-ray14 validation set images and 1962 test set images are provided.© RSNA, 2019Online supplemental material is available for this article.See also the editorial by Chang in this issue.

Asunto(s)

Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Enfermedades Respiratorias/diagnóstico por imagen , Traumatismos Torácicos/diagnóstico por imagen , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Aprendizaje Profundo , Femenino , Humanos , Lactante , Masculino , Persona de Mediana Edad , Neumotórax , Radiólogos , Estándares de Referencia , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sensibilidad y Especificidad , Adulto Joven

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA