Your browser doesn't support javascript.
loading
Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard.
Cheong, Ryan Chin Taw; Pang, Kenny Peter; Unadkat, Samit; Mcneillis, Venkata; Williamson, Andrew; Joseph, Jonathan; Randhawa, Premjit; Andrews, Peter; Paleri, Vinidh.
Afiliação
  • Cheong RCT; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK. ryan.cheong@nhs.net.
  • Pang KP; Asia Sleep Centre, Singapore, Singapore.
  • Unadkat S; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK.
  • Mcneillis V; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK.
  • Williamson A; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK.
  • Joseph J; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK.
  • Randhawa P; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK.
  • Andrews P; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK.
  • Paleri V; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK.
Eur Arch Otorhinolaryngol ; 281(4): 2137-2143, 2024 Apr.
Article em En | MEDLINE | ID: mdl-38117307
ABSTRACT

PURPOSE:

To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.

METHODS:

A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.

RESULTS:

GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep-Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.

CONCLUSIONS:

Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Médicos / Inteligência Artificial Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Médicos / Inteligência Artificial Idioma: En Ano de publicação: 2024 Tipo de documento: Article