Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations.

Igarashi, Yutaka; Nakahara, Kyoichi; Norii, Tatsuya; Miyake, Nodoka; Tagami, Takashi; Yokobori, Shoji

Igarashi, Yutaka; Nakahara, Kyoichi; Norii, Tatsuya; Miyake, Nodoka; Tagami, Takashi; Yokobori, Shoji.

Afiliação

Igarashi Y; Department of Emergency and Critical Care Medicine, Nippon Medical School.
Nakahara K; Department of Emergency and Critical Care Medicine, Nippon Medical School.
Norii T; Department of Emergency Medicine, University of New Mexico, NM, United States of America.
Miyake N; Department of Emergency and Critical Care Medicine, Nippon Medical School.
Tagami T; Department of Emergency and Critical Care Medicine, Nippon Medical School Musashi Kosugi Hospital.
Yokobori S; Department of Emergency and Critical Care Medicine, Nippon Medical School.

J Nippon Med Sch ; 91(2): 155-161, 2024 May 21.

Article em En | MEDLINE | ID: mdl-38432929

ABSTRACT

ABSTRACT

BACKGROUND:

Emergency physicians need a broad range of knowledge and skills to address critical medical, traumatic, and environmental conditions. Artificial intelligence (AI), including large language models (LLMs), has potential applications in healthcare settings; however, the performance of LLMs in emergency medicine remains unclear.

METHODS:

To evaluate the reliability of information provided by ChatGPT, an LLM was given the questions set by the Japanese Association of Acute Medicine in its board certification examinations over a period of 5 years (2018-2022) and programmed to answer them twice. Statistical analysis was used to assess agreement of the two responses.

RESULTS:

The LLM successfully answered 465 of the 475 text-based questions, achieving an overall correct response rate of 62.3%. For questions without images, the rate of correct answers was 65.9%. For questions with images that were not explained to the LLM, the rate of correct answers was only 52.0%. The annual rates of correct answers to questions without images ranged from 56.3% to 78.8%. Accuracy was better for scenario-based questions (69.1%) than for stand-alone questions (62.1%). Agreement between the two responses was substantial (kappa = 0.70). Factual error accounted for 82% of the incorrectly answered questions.

CONCLUSION:

An LLM performed satisfactorily on an emergency medicine board certification examination in Japanese and without images. However, factual errors in the responses highlight the need for physician oversight when using LLMs.

Assuntos

Certificação; Medicina de Emergência; Idioma; Medicina de Emergência/educação; Japão; Humanos; Avaliação Educacional/métodos; Conselhos de Especialidade Profissional; Reprodutibilidade dos Testes; Inteligência Artificial; Competência Clínica; População do Leste Asiático

Palavras-chave

artificial intelligence; emergency medicine; language; medicine; specialty boards

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Certificação / Medicina de Emergência / Idioma Limite: Humans País/Região como assunto: Asia Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google