Deep learning predicts hip fracture using confounding patient and healthcare variables.

Badgeley, Marcus A; Zech, John R; Oakden-Rayner, Luke; Glicksberg, Benjamin S; Liu, Manway; Gale, William; McConnell, Michael V; Percha, Bethany; Snyder, Thomas M; Dudley, Joel T

Badgeley, Marcus A; Zech, John R; Oakden-Rayner, Luke; Glicksberg, Benjamin S; Liu, Manway; Gale, William; McConnell, Michael V; Percha, Bethany; Snyder, Thomas M; Dudley, Joel T.

Afiliação

Badgeley MA; Verily Life Sciences LLC, South San Francisco, CA USA.
Zech JR; 2Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Oakden-Rayner L; 3Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA.
Glicksberg BS; 4Department of Medicine, California Pacific Medical Center, San Francisco, CA USA.
Liu M; 5School of Public Health, The University of Adelaide, Adelaide, South Australia Australia.
Gale W; 6Bakar Computational Health Sciences Institute, University of California, San Francisco, CA USA.
McConnell MV; Verily Life Sciences LLC, South San Francisco, CA USA.
Percha B; 7School of Computer Sciences, The University of Adelaide, Adelaide, South Australia Australia.
Snyder TM; Verily Life Sciences LLC, South San Francisco, CA USA.
Dudley JT; 8Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford, CA USA.

NPJ Digit Med ; 2: 31, 2019.

Article em En | MEDLINE | ID: mdl-31304378

RESUMO

Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked "priority" (AUC = 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.

Palavras-chave

Computer science; Radiography; Statistics

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: NPJ Digit Med Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google