Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

Stengel, Felix C; Stienen, Martin N; Ivanov, Marcel; Gandía-González, María L; Raffa, Giovanni; Ganau, Mario; Whitfield, Peter; Motov, Stefan

Stengel, Felix C; Stienen, Martin N; Ivanov, Marcel; Gandía-González, María L; Raffa, Giovanni; Ganau, Mario; Whitfield, Peter; Motov, Stefan.

Afiliación

Stengel FC; Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.
Stienen MN; Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.
Ivanov M; Royal Hallamshire Hospital, Sheffield, United Kingdom.
Gandía-González ML; Hospital Universitario La Paz, Madrid, Spain.
Raffa G; Division of Neurosurgery, BIOMORF Department, University of Messina, Messina, Italy.
Ganau M; Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.
Whitfield P; South West Neurosurgery Centre, Plymouth, United Kingdom.
Motov S; Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.

Brain Spine ; 4: 102765, 2024.

Article en En | MEDLINE | ID: mdl-38510593

ABSTRACT

ABSTRACT

Introduction:

Artificial intelligence (AI) based large language models (LLM) contain enormous potential in education and training. Recent publications demonstrated that they are able to outperform participants in written medical exams. Research question We aimed to explore the accuracy of AI in the written part of the EANS board exam. Material and

methods:

Eighty-six representative single best answer (SBA) questions, included at least ten times in prior EANS board exams, were selected by the current EANS board exam committee. The questions' content was classified as 75 text-based (TB) and 11 image-based (IB) and their structure as 50 interpretation-weighted, 30 theory-based and 6 true-or-false. Questions were tested with Chat GPT 3.5, Bing and Bard. The AI and participant results were statistically analyzed through ANOVA tests with Stata SE 15 (StataCorp, College Station, TX). P-values of <0.05 were considered as statistically significant.

Results:

The Bard LLM achieved the highest accuracy with 62% correct questions overall and 69% excluding IB, outperforming human exam participants 59% (p = 0.67) and 59% (p = 0.42), respectively. All LLMs scored highest in theory-based questions, excluding IB questions (Chat-GPT 79%; Bing 83%; Bard 86%) and significantly better than the human exam participants (60%; p = 0.03). AI could not answer any IB question correctly. Discussion and

conclusion:

AI passed the written EANS board exam based on representative SBA questions and achieved results close to or even better than the human exam participants. Our results raise several ethical and practical implications, which may impact the current concept for the written EANS board exam.

Palabras clave

Artificial intelligence; Bard; Bing; Board-certification; Chat gpt; EANS; Neurosurgery board examination

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Brain Spine Año: 2024 Tipo del documento: Article País de afiliación: Suiza Pais de publicación: Países Bajos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google