Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment.

Dehdab, Reza; Brendlin, Andreas; Werner, Sebastian; Almansour, Haidara; Gassenmaier, Sebastian; Brendel, Jan Michael; Nikolaou, Konstantin; Afat, Saif

Dehdab, Reza; Brendlin, Andreas; Werner, Sebastian; Almansour, Haidara; Gassenmaier, Sebastian; Brendel, Jan Michael; Nikolaou, Konstantin; Afat, Saif.

Affiliation

Dehdab R; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany. reza.dehdab@med.uni-tuebingen.de.
Brendlin A; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Werner S; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Almansour H; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Gassenmaier S; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Brendel JM; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Nikolaou K; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.
Afat S; Department of Diagnostic and Interventional Radiology, Tuebingen University Hospital, Hoppe-Seyler-Straße 3, 72076, Tuebingen, Germany.

Jpn J Radiol ; 2024 Jun 13.

Article in En | MEDLINE | ID: mdl-38867035

ABSTRACT

ABSTRACT

PURPOSE:

To assess the diagnostic accuracy of ChatGPT-4V in interpreting a set of four chest CT slices for each case of COVID-19, non-small cell lung cancer (NSCLC), and control cases, thereby evaluating its potential as an AI tool in radiological diagnostics. MATERIALS AND

METHODS:

In this retrospective study, 60 CT scans from The Cancer Imaging Archive, covering COVID-19, NSCLC, and control cases were analyzed using ChatGPT-4V. A radiologist selected four CT slices from each scan for evaluation. ChatGPT-4V's interpretations were compared against the gold standard diagnoses and assessed by two radiologists. Statistical analyses focused on accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), along with an examination of the impact of pathology location and lobe involvement.

RESULTS:

ChatGPT-4V showed an overall diagnostic accuracy of 56.76%. For NSCLC, sensitivity was 27.27% and specificity was 60.47%. In COVID-19 detection, sensitivity was 13.64% and specificity of 64.29%. For control cases, the sensitivity was 31.82%, with a specificity of 95.24%. The highest sensitivity (83.33%) was observed in cases involving all lung lobes. The chi-squared statistical analysis indicated significant differences in Sensitivity across categories and in relation to the location and lobar involvement of pathologies.

CONCLUSION:

ChatGPT-4V demonstrated variable diagnostic performance in chest CT interpretation, with notable proficiency in specific scenarios. This underscores the challenges of cross-modal AI models like ChatGPT-4V in radiology, pointing toward significant areas for improvement to ensure dependability. The study emphasizes the importance of enhancing these models for broader, more reliable medical use.

Key words

AI (artificial intelligence); ChatGPT-4V; Computed tomography; Computer-aided diagnosis (CAD)

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Language: En Year: 2024 Type: Article

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Language: En Year: 2024 Type: Article