Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery.

Lechien, Jerome R; Briganti, Giovanni; Vaira, Luigi A

Lechien, Jerome R; Briganti, Giovanni; Vaira, Luigi A.

Afiliação

Lechien JR; Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium. Jerome.Lechien@umons.ac.be.
Briganti G; Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, Phonetics and Phonology Laboratory (UMR 7018, Foch Hospital, CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France. Jerome.Lechien@umons.ac.be.
Vaira LA; Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, CHU de Bruxelles, CHU Saint-Pierre, Université Libre de Bruxelles, Brussels, Belgium. Jerome.Lechien@umons.ac.be.

Eur Arch Otorhinolaryngol ; 281(4): 2159-2165, 2024 Apr.

Article em En | MEDLINE | ID: mdl-38206389

ABSTRACT

ABSTRACT

INTRODUCTION:

Chatbot generative pre-trained transformer (ChatGPT) is a new artificial intelligence-powered language model of chatbot able to help otolaryngologists in practice and research. We investigated the accuracy of ChatGPT-3.5 and -4 in the referencing of manuscripts published in otolaryngology.

METHODS:

ChatGPT-3.5 and ChatGPT-4 were interrogated for providing references of the top-30 most cited papers in otolaryngology in the past 40 years including clinical guidelines and key studies that changed the practice. The responses were regenerated three times to assess the accuracy and stability of ChatGPT. ChatGPT-3.5 and ChatGPT-4 were compared for accuracy of reference and potential mistakes.

RESULTS:

The accuracy of ChatGPT-3.5 and ChatGPT-4.0 ranged from 47% to 60%, and 73% to 87%, respectively (p < 0.005). ChatGPT-3.5 provided 19 inaccurate references and invented 2 references throughout the regenerated questions. ChatGPT-4.0 provided 13 inaccurate references, while it proposed only one invented reference. The stability of responses throughout regenerated answers was mild (k = 0.238) and moderate (k = 0.408) for ChatGPT-3.5 and 4.0, respectively.

CONCLUSIONS:

ChatGPT-4.0 reported higher accuracy than the free-access version (3.5). False references were detected in both 3.5 and 4.0 versions. Practitioners need to be careful regarding the use of ChatGPT in the reach of some key reference when writing a report.

Assuntos

Inteligência Artificial; Otolaringologia; Humanos; Software; Otorrinolaringologistas; Idioma

Palavras-chave

Artificial intelligence; ChatGPT; Chatbot; Head neck surgery; Otolaryngology; Reference

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Otolaringologia / Inteligência Artificial Tipo de estudo: Guideline Limite: Humans Idioma: En Revista: Eur Arch Otorhinolaryngol / Eur. arch. oto-rhino-laryngol / European archives of oto-rhino-laryngology Assunto da revista: OTORRINOLARINGOLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Bélgica

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google