Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation.

Mayo-Yáñez, Miguel; Lechien, Jerome R; Maria-Saibene, Alberto; Vaira, Luigi A; Maniaci, Antonino; Chiesa-Estomba, Carlos M

Mayo-Yáñez, Miguel; Lechien, Jerome R; Maria-Saibene, Alberto; Vaira, Luigi A; Maniaci, Antonino; Chiesa-Estomba, Carlos M.

Afiliação

Mayo-Yáñez M; Young-Otolaryngologists of the International Federation of Oto-Rhino-Laryngological Societies (YO-IFOS) Study Group, 75000 Paris, France.
Lechien JR; Otorhinolaryngology - Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), 15006 A Coruña, Galicia Spain.
Maria-Saibene A; Otorhinolaryngology-Head and Neck Surgery Department, Hospital San Rafael (HSR) de A Coruña, 15006 A Coruña, Spain.
Vaira LA; Otorhinolaryngology Research Group, Institute of Biomedical Research of A Coruña, (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), Universidade da Coruña (UDC), 15006 A Coruña, Spain.
Maniaci A; Young-Otolaryngologists of the International Federation of Oto-Rhino-Laryngological Societies (YO-IFOS) Study Group, 75000 Paris, France.
Chiesa-Estomba CM; Department of Otolaryngology, Polyclinique de Poitiers, Elsan Hospital, 86000 Poitiers, France.

Indian J Otolaryngol Head Neck Surg ; 76(4): 3465-3469, 2024 Aug.

Article em En | MEDLINE | ID: mdl-39130248

ABSTRACT

ABSTRACT

To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT's proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.

Palavras-chave

Artificial; ChatGPT; Chatbot; Comparison; Diagnosis; GPT; Head neck; Instrument; Intelligence; Internet; Medicine; Otolaryngology; Performance; Surgery; Tool; Treatment

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Indian J Otolaryngol Head Neck Surg Ano de publicação: 2024 Tipo de documento: Article País de afiliação: França

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google