A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.

Long, Cai; Lowe, Kayle; Zhang, Jessica; Santos, André Dos; Alanazi, Alaa; O'Brien, Daniel; Wright, Erin D; Cote, David

Long, Cai; Lowe, Kayle; Zhang, Jessica; Santos, André Dos; Alanazi, Alaa; O'Brien, Daniel; Wright, Erin D; Cote, David.

Afiliação

Long C; Division of Otolaryngology-Head and Neck Surgery, University of Alberta, Edmonton, AB, Canada.
Lowe K; Faculty of Medicine, University of Alberta, Edmonton, AB, Canada.
Zhang J; Faculty of Medicine, University of Alberta, Edmonton, AB, Canada.
Santos AD; Alberta Machine Intelligence Institute, Edmonton, AB, Canada.
Alanazi A; Division of Otolaryngology-Head and Neck Surgery, University of Alberta, Edmonton, AB, Canada.
O'Brien D; Department of Surgery, Creighton University, Omaha, NE, United States.
Wright ED; Division of Otolaryngology-Head and Neck Surgery, University of Alberta, Edmonton, AB, Canada.
Cote D; Division of Otolaryngology-Head and Neck Surgery, University of Alberta, Edmonton, AB, Canada.

JMIR Med Educ ; 10: e49970, 2024 Jan 16.

Article em En | MEDLINE | ID: mdl-38227351

ABSTRACT

ABSTRACT

BACKGROUND:

ChatGPT is among the most popular large language models (LLMs), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on otolaryngology-head and neck surgery (OHNS) certification examinations and open-ended medical board certification examinations has not been reported.

OBJECTIVE:

We aimed to evaluate the performance of ChatGPT on OHNS board examinations and propose a novel method to assess an AI model's performance on open-ended medical board examination questions.

METHODS:

Twenty-one open-ended questions were adopted from the Royal College of Physicians and Surgeons of Canada's sample examination to query ChatGPT on April 11, 2023, with and without prompts. A new model, named Concordance, Validity, Safety, Competency (CVSC), was developed to evaluate its performance.

RESULTS:

In an open-ended question assessment, ChatGPT achieved a passing mark (an average of 75% across 3 trials) in the attempts and demonstrated higher accuracy with prompts. The model demonstrated high concordance (92.06%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed.

CONCLUSIONS:

ChatGPT achieved a passing score in the sample examination and demonstrated the potential to pass the OHNS certification examination of the Royal College of Physicians and Surgeons of Canada. Some concerns remain due to its hallucinations, which could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation.

Assuntos

Otolaringologia; Cirurgiões; Humanos; Canadá; Certificação; Alucinações

Palavras-chave

AI; ChatGPT; ENT; LLM; LLMs; NLP; OHNS; answer; answers; artificial intelligence; chatbot; chatbots; clinical implementation; ear; exam; examination; examinations; exams; language model; large language models; laryngology; machine learning; medical education; medical examination; medical licensing; natural language processing; nose; otolaryngology; otolaryngology/head and neck surgery; otology; patient safety; response; responses; safety; surgery; surgical; throat; wide range information

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Otolaringologia / Cirurgiões Tipo de estudo: Prognostic_studies Limite: Humans País/Região como assunto: America do norte Idioma: En Revista: JMIR Med Educ Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google