Pesquisa | BVS MTCI Américas

Using generative AI to investigate medical imagery models and datasets.

Lang, Oran; Yaya-Stupp, Doron; Traynis, Ilana; Cole-Lewis, Heather; Bennett, Chloe R; Lyles, Courtney R; Lau, Charles; Irani, Michal; Semturs, Christopher; Webster, Dale R; Corrado, Greg S; Hassidim, Avinatan; Matias, Yossi; Liu, Yun; Hammel, Naama; Babenko, Boris.

EBioMedicine ; 102: 105075, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38565004

RESUMO

BACKGROUND: AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren't yet known to experts. METHODS: In this paper, we present a workflow for generating hypotheses to understand which visual signals in images are correlated with a classification model's predictions for a given task. This approach leverages an automatic visual explanation algorithm followed by interdisciplinary expert review. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier ("StylEx"); (iii) Automatically detect, extract, and visualize the top visual attributes that the classifier is sensitive towards. For visualization, we independently modify each of these attributes to generate counterfactual visualizations for a set of images (i.e., what the image would look like with the attribute increased or decreased); (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, present the discovered attributes and corresponding counterfactual visualizations to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health (e.g., whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries). FINDINGS: To demonstrate the broad applicability of our approach, we present results on eight prediction tasks across three medical imaging modalities-retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible, previously-unknown attributes based on the literature (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). INTERPRETATION: Our approach enables hypotheses generation via attribute visualizations and has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models, as well as debug and design better datasets. Though not designed to infer causality, importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence interdisciplinary perspectives are critical in these investigations. Finally, we will release code to help researchers train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes. FUNDING: Google.

Assuntos

Algoritmos , Catarata , Humanos , Cardiomegalia , Fundo de Olho , Inteligência Artificial

Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders.

Limwattanayingyong, Jirawut; Nganthavee, Variya; Seresirikachorn, Kasem; Singalavanija, Tassapol; Soonthornworasiri, Ngamphol; Ruamviboonsuk, Varis; Rao, Chetan; Raman, Rajiv; Grzybowski, Andrzej; Schaekermann, Mike; Peng, Lily H; Webster, Dale R; Semturs, Christopher; Krause, Jonathan; Sayres, Rory; Hersch, Fred; Tiwari, Richa; Liu, Yun; Ruamviboonsuk, Paisan.

J Diabetes Res ; 2020: 8839376, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33381600

RESUMO

OBJECTIVE: To evaluate diabetic retinopathy (DR) screening via deep learning (DL) and trained human graders (HG) in a longitudinal cohort, as case spectrum shifts based on treatment referral and new-onset DR. METHODS: We randomly selected patients with diabetes screened twice, two years apart within a nationwide screening program. The reference standard was established via adjudication by retina specialists. Each patient's color fundus photographs were graded, and a patient was considered as having sight-threatening DR (STDR) if the worse eye had severe nonproliferative DR, proliferative DR, or diabetic macular edema. We compared DR screening via two modalities: DL and HG. For each modality, we simulated treatment referral by excluding patients with detected STDR from the second screening using that modality. RESULTS: There were 5,738 patients (12.3% STDR) in the first screening. DL and HG captured different numbers of STDR cases, and after simulated referral and excluding ungradable cases, 4,148 and 4,263 patients remained in the second screening, respectively. The STDR prevalence at the second screening was 5.1% and 6.8% for DL- and HG-based screening, respectively. Along with the prevalence decrease, the sensitivity for both modalities decreased from the first to the second screening (DL: from 95% to 90%, p = 0.008; HG: from 74% to 57%, p < 0.001). At both the first and second screenings, the rate of false negatives for the DL was a fifth that of HG (0.5-0.6% vs. 2.9-3.2%). CONCLUSION: On 2-year longitudinal follow-up of a DR screening cohort, STDR prevalence decreased for both DL- and HG-based screening. Follow-up screenings in longitudinal DR screening can be more difficult and induce lower sensitivity for both DL and HG, though the false negative rate was substantially lower for DL. Our data may be useful for health-economics analyses of longitudinal screening settings.

Assuntos

Aprendizado Profundo , Retinopatia Diabética/diagnóstico por imagem , Fundo de Olho , Interpretação de Imagem Assistida por Computador , Edema Macular/diagnóstico por imagem , Programas de Rastreamento , Fotografação , Idoso , Proliferação de Células , Retinopatia Diabética/epidemiologia , Feminino , Humanos , Incidência , Estudos Longitudinais , Edema Macular/epidemiologia , Masculino , Pessoa de Meia-Idade , Programas Nacionais de Saúde , Valor Preditivo dos Testes , Prevalência , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Tailândia/epidemiologia

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA