Educational Limitations of ChatGPT in Neurosurgery Board Preparation.

Powers, Andrew Y; McCandless, Martin G; Taussky, Philipp; Vega, Rafael A; Shutran, Max S; Moses, Ziev B

Powers, Andrew Y; McCandless, Martin G; Taussky, Philipp; Vega, Rafael A; Shutran, Max S; Moses, Ziev B.

Afiliação

Powers AY; Neurosurgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.
McCandless MG; Neurosurgery, University of Mississippi Medical Center, Jackson, USA.
Taussky P; Neurosurgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.
Vega RA; Neurosurgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.
Shutran MS; Neurosurgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.
Moses ZB; Neurosurgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.

Cureus ; 16(4): e58639, 2024 Apr.

Article em En | MEDLINE | ID: mdl-38770467

ABSTRACT

ABSTRACT

Objective This study evaluated the potential of Chat Generative Pre-trained Transformer (ChatGPT) as an educational tool for neurosurgery residents preparing for the American Board of Neurological Surgery (ABNS) primary examination. Methods Non-imaging questions from the Congress of Neurological Surgeons (CNS) Self-Assessment in Neurological Surgery (SANS) online question bank were input into ChatGPT. Accuracy was evaluated and compared to human performance across subcategories. To quantify ChatGPT's educational potential, the concordance and insight of explanations were assessed by multiple neurosurgical faculty. Associations among these metrics as well as question length were evaluated. Results ChatGPT had an accuracy of 50.4% (1,068/2,120), with the highest and lowest accuracies in the pharmacology (81.2%, 13/16) and vascular (32.9%, 91/277) subcategories, respectively. ChatGPT performed worse than humans overall, as well as in the functional, other, peripheral, radiology, spine, trauma, tumor, and vascular subcategories. There were no subjects in which ChatGPT performed better than humans and its accuracy was below that required to pass the exam. The mean concordance was 93.4% (198/212) and the mean insight score was 2.7. Accuracy was negatively associated with question length (R2=0.29, p=0.03) but positively associated with both concordance (p<0.001, q<0.001) and insight (p<0.001, q<0.001). Conclusions The current study provides the largest and most comprehensive assessment of the accuracy and explanatory quality of ChatGPT in answering ABNS primary exam questions. The findings demonstrate shortcomings regarding ChatGPT's ability to pass, let alone teach, the neurosurgical boards.

Palavras-chave

artificial intelligence; large language model; machine learning; medical education; neurosurgery

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links