Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems.

Mahmood, Usman; Shrestha, Robik; Bates, David D B; Mannelli, Lorenzo; Corrias, Giuseppe; Erdi, Yusuf Emre; Kanan, Christopher

Mahmood, Usman; Shrestha, Robik; Bates, David D B; Mannelli, Lorenzo; Corrias, Giuseppe; Erdi, Yusuf Emre; Kanan, Christopher.

Afiliación

Mahmood U; Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States.
Shrestha R; Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, United States.
Bates DDB; Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States.
Mannelli L; Institute of Research and Medical Care (IRCCS) SDN, Institute of Diagnostic and Nuclear Research, Naples, Italy.
Corrias G; Department of Radiology, University of Cagliari, Cagliari, Italy.
Erdi YE; Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States.
Kanan C; Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, United States.

Front Digit Health ; 3: 671015, 2021.

Article en En | MEDLINE | ID: mdl-34713144

RESUMEN

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

Palabras clave

artificial intelligence; bias; computed tomography; deep learning; spurious correlations; validation

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Guideline / Prognostic_studies Idioma: En Revista: Front Digit Health Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google