Your browser doesn't support javascript.
loading
ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.
Mihalache, Andrew; Huang, Ryan S; Popovic, Marko M; Muni, Rajeev H.
Afiliação
  • Mihalache A; Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.
  • Huang RS; Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.
  • Popovic MM; Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.
  • Muni RH; Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.
Med Teach ; 46(3): 366-372, 2024 03.
Article em En | MEDLINE | ID: mdl-37839017
ABSTRACT

PURPOSE:

ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.

METHOD:

Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.

RESULTS:

ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03).

CONCLUSIONS:

ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article