ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

Hoppe, John Michael; Auer, Matthias K; Strüven, Anna; Massberg, Steffen; Stremmel, Christopher

Hoppe, John Michael; Auer, Matthias K; Strüven, Anna; Massberg, Steffen; Stremmel, Christopher.

Afiliação

Hoppe JM; Department of Medicine IV, LMU University Hospital, Munich, Germany.
Auer MK; Department of Medicine IV, LMU University Hospital, Munich, Germany.
Strüven A; Department of Medicine I, LMU University Hospital, Munich, Germany.
Massberg S; Munich Heart Alliance Partner Site, Deutsches Zentrum für Herz-Kreislaufforschung (German Centre for Cardiovascular Research), LMU University Hospital, Munich, Germany.
Stremmel C; Department of Medicine I, LMU University Hospital, Munich, Germany.

J Med Internet Res ; 26: e56110, 2024 Jul 08.

Article em En | MEDLINE | ID: mdl-38976865

ABSTRACT

ABSTRACT

BACKGROUND:

OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated.

OBJECTIVE:

This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting.

METHODS:

Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy.

RESULTS:

The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5 P=.01). However, in other categories, the differences were not statistically significant.

CONCLUSIONS:

In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings.

Assuntos

Serviço Hospitalar de Emergência; Humanos; Serviço Hospitalar de Emergência/estatística & dados numéricos; Estudos Retrospectivos; Idoso; Feminino; Pessoa de Meia-Idade; Masculino; Idoso de 80 Anos ou mais; Inteligência Artificial; Médicos/estatística & dados numéricos; Processamento de Linguagem Natural; Triagem/métodos

Palavras-chave

AI; ChatGPT; NLP; OpenAI; accuracy; artificial intelligence; diagnosis; diagnostic accuracy; emergency department; emergency medicine triage; internal medicine; natural language processing; physician; physicians; triage

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Serviço Hospitalar de Emergência Limite: Aged / Aged80 / Female / Humans / Male / Middle aged Idioma: En Revista: J Med Internet Res Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google