Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study.

Seah, Jarrel C Y; Tang, Cyril H M; Buchlak, Quinlan D; Holt, Xavier G; Wardman, Jeffrey B; Aimoldin, Anuar; Esmaili, Nazanin; Ahmad, Hassan; Pham, Hung; Lambert, John F; Hachey, Ben; Hogg, Stephen J F; Johnston, Benjamin P; Bennett, Christine; Oakden-Rayner, Luke; Brotchie, Peter; Jones, Catherine M

Seah, Jarrel C Y; Tang, Cyril H M; Buchlak, Quinlan D; Holt, Xavier G; Wardman, Jeffrey B; Aimoldin, Anuar; Esmaili, Nazanin; Ahmad, Hassan; Pham, Hung; Lambert, John F; Hachey, Ben; Hogg, Stephen J F; Johnston, Benjamin P; Bennett, Christine; Oakden-Rayner, Luke; Brotchie, Peter; Jones, Catherine M.

Affiliation

Seah JCY; Annalise.ai, Sydney, NSW, Australia; Department of Radiology, Alfred Health, Melbourne, VIC, Australia.
Tang CHM; Annalise.ai, Sydney, NSW, Australia.
Buchlak QD; Annalise.ai, Sydney, NSW, Australia. Electronic address: quinlan.buchlak1@my.nd.edu.au.
Holt XG; Annalise.ai, Sydney, NSW, Australia.
Wardman JB; Annalise.ai, Sydney, NSW, Australia.
Aimoldin A; Annalise.ai, Sydney, NSW, Australia.
Esmaili N; School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia; Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, Australia.
Ahmad H; Annalise.ai, Sydney, NSW, Australia.
Pham H; Annalise.ai, Sydney, NSW, Australia.
Lambert JF; Annalise.ai, Sydney, NSW, Australia.
Hachey B; Annalise.ai, Sydney, NSW, Australia.
Hogg SJF; Annalise.ai, Sydney, NSW, Australia.
Johnston BP; Annalise.ai, Sydney, NSW, Australia.
Bennett C; School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia.
Oakden-Rayner L; Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA, Australia.
Brotchie P; Annalise.ai, Sydney, NSW, Australia; Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia.
Jones CM; I-MED Radiology Network, Brisbane, QLD, Australia.

Lancet Digit Health ; 3(8): e496-e506, 2021 08.

Article in En | MEDLINE | ID: mdl-34219054

ABSTRACT

ABSTRACT

BACKGROUND:

Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model.

METHODS:

In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than -0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior.

FINDINGS:

Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645-0·785) across the 127 clinical findings, compared with 0·808 (0·763-0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645-0·785) across all findings, compared with 0·957 (0·954-0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings.

INTERPRETATION:

This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice.

FUNDING:

Annalise.ai.

Subject(s)

Deep Learning; Mass Screening/methods; Models, Biological; Radiographic Image Interpretation, Computer-Assisted; Radiography, Thoracic; X-Rays; Adolescent; Adult; Aged; Aged, 80 and over; Area Under Curve; Artificial Intelligence; Female; Humans; Infections/diagnosis; Infections/diagnostic imaging; Male; Middle Aged; ROC Curve; Radiologists; Retrospective Studies; Thoracic Injuries/diagnosis; Thoracic Injuries/diagnostic imaging; Thoracic Neoplasms/diagnosis; Thoracic Neoplasms/diagnostic imaging; Young Adult

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Main subject: X-Rays / Radiography, Thoracic / Radiographic Image Interpretation, Computer-Assisted / Mass Screening / Deep Learning / Models, Biological Type of study: Diagnostic_studies / Evaluation_studies / Observational_studies / Prognostic_studies / Risk_factors_studies / Screening_studies Limits: Adolescent / Adult / Aged / Aged80 / Female / Humans / Male / Middle aged Language: En Journal: Lancet Digit Health Year: 2021 Type: Article Affiliation country: Australia

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google