Your browser doesn't support javascript.
loading
Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases.
Kikuchi, Tomohiro; Nakao, Takahiro; Nakamura, Yuta; Hanaoka, Shouhei; Mori, Harushi; Yoshikawa, Takeharu.
Afiliação
  • Kikuchi T; From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan r1419kt@jichi.ac.jp.
  • Nakao T; Department of Radiology (T.K., H.M.), School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan.
  • Nakamura Y; From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan.
  • Hanaoka S; From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan.
  • Mori H; Departments of Radiology (S.H.), The University of Tokyo Hospital, Tokyo, Japan.
  • Yoshikawa T; Department of Radiology (T.K., H.M.), School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan.
AJNR Am J Neuroradiol ; 45(10): 1506-1511, 2024 Oct 03.
Article em En | MEDLINE | ID: mdl-38719605
ABSTRACT
BACKGROUND AND

PURPOSE:

The rise of large language models such as generative pretrained transformers (GPTs) has sparked considerable interest in radiology, especially in interpreting radiologic reports and image findings. While existing research has focused on GPTs estimating diagnoses from radiologic descriptions, exploring alternative diagnostic information sources is also crucial. This study introduces the use of GPTs (GPT-3.5 Turbo and GPT-4) for information retrieval and summarization, searching relevant case reports via PubMed, and investigates their potential to aid diagnosis. MATERIALS AND

METHODS:

From October 2021 to December 2023, we selected 115 cases from the "Case of the Week" series on the American Journal of Neuroradiology website. Their Description and Legend sections were presented to the GPTs for the 2 tasks. For the Direct Diagnosis task, the models provided 3 differential diagnoses that were considered correct if they matched the diagnosis in the diagnosis section. For the Case Report Search task, the models generated 2 keywords per case, creating PubMed search queries to extract up to 3 relevant reports. A response was considered correct if reports containing the disease name stated in the diagnosis section were extracted. The McNemar test was used to evaluate whether adding a Case Report Search to Direct Diagnosis improved overall accuracy.

RESULTS:

In the Direct Diagnosis task, GPT-3.5 Turbo achieved a correct response rate of 26% (30/115 cases), whereas GPT-4 achieved 41% (47/115). For the Case Report Search task, GPT-3.5 Turbo scored 10% (11/115), and GPT-4 scored 7% (8/115). Correct responses totaled 32% (37/115) with 3 overlapping cases for GPT-3.5 Turbo, whereas GPT-4 had 43% (50/115) of correct responses with 5 overlapping cases. Adding Case Report Search improved GPT-3.5 Turbo's performance (P = .023) but not that of GPT-4 (P = .248).

CONCLUSIONS:

The effectiveness of adding Case Report Search to GPT-3.5 Turbo was particularly pronounced, suggesting its potential as an alternative diagnostic approach to GPTs, particularly in scenarios where direct diagnoses from GPTs are not obtainable. Nevertheless, the overall performance of GPT models in both direct diagnosis and case report retrieval tasks remains not optimal, and users should be aware of their limitations.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Armazenamento e Recuperação da Informação Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Armazenamento e Recuperação da Informação Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article