Pesquisa | BVS Doenças Infecciosas e Parasitárias

A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.

Long, Cai; Lowe, Kayle; Zhang, Jessica; Santos, André Dos; Alanazi, Alaa; O'Brien, Daniel; Wright, Erin D; Cote, David.

JMIR Med Educ ; 10: e49970, 2024 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-38227351

RESUMO

BACKGROUND: ChatGPT is among the most popular large language models (LLMs), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on otolaryngology-head and neck surgery (OHNS) certification examinations and open-ended medical board certification examinations has not been reported. OBJECTIVE: We aimed to evaluate the performance of ChatGPT on OHNS board examinations and propose a novel method to assess an AI model's performance on open-ended medical board examination questions. METHODS: Twenty-one open-ended questions were adopted from the Royal College of Physicians and Surgeons of Canada's sample examination to query ChatGPT on April 11, 2023, with and without prompts. A new model, named Concordance, Validity, Safety, Competency (CVSC), was developed to evaluate its performance. RESULTS: In an open-ended question assessment, ChatGPT achieved a passing mark (an average of 75% across 3 trials) in the attempts and demonstrated higher accuracy with prompts. The model demonstrated high concordance (92.06%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed. CONCLUSIONS: ChatGPT achieved a passing score in the sample examination and demonstrated the potential to pass the OHNS certification examination of the Royal College of Physicians and Surgeons of Canada. Some concerns remain due to its hallucinations, which could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation.

Assuntos

Otolaringologia , Cirurgiões , Humanos , Canadá , Certificação , Alucinações

ChatENT: Augmented Large Language Model for Expert Knowledge Retrieval in Otolaryngology-Head and Neck Surgery.

Long, Cai; Subburam, Deepak; Lowe, Kayle; Dos Santos, André; Zhang, Jessica; Hwang, Sang; Saduka, Neil; Horev, Yoav; Su, Tao; Côté, David W J; Wright, Erin D.

Otolaryngol Head Neck Surg ; 2024 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-38895862

RESUMO

OBJECTIVE: The recent surge in popularity of large language models (LLMs), such as ChatGPT, has showcased their proficiency in medical examinations and potential applications in health care. However, LLMs possess inherent limitations, including inconsistent accuracy, specific prompting requirements, and the risk of generating harmful hallucinations. A domain-specific model might address these limitations effectively. STUDY DESIGN: Developmental design. SETTING: Virtual. METHODS: Otolaryngology-head and neck surgery (OHNS) relevant data were systematically gathered from open-access Internet sources and indexed into a knowledge database. We leveraged Retrieval-Augmented Language Modeling to recall this information and utilized it for pretraining, which was then integrated into ChatGPT4.0, creating an OHNS-specific knowledge question & answer platform known as ChatENT. The model is further tested on different types of questions. RESULTS: ChatENT showed enhanced performance in the analysis and interpretation of OHNS information, outperforming ChatGPT4.0 in both the Canadian Royal College OHNS sample examination questions challenge and the US board practice questions challenge, with a 58.4% and 26.0% error reduction, respectively. ChatENT generated fewer hallucinations and demonstrated greater consistency. CONCLUSION: To the best of our knowledge, ChatENT is the first specialty-specific knowledge retrieval artificial intelligence in the medical field that utilizes the latest LLM. It appears to have considerable promise in areas such as medical education, patient education, and clinical decision support. The model has demonstrated the capacity to overcome the limitations of existing LLMs, thereby signaling a future of more precise, safe, and user-friendly applications in the realm of OHNS and other medical fields.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA