Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study.

Williams, Simon C; Starup-Hansen, Joachim; Funnell, Jonathan P; Hanrahan, John Gerrard; Valetopoulou, Alexandra; Singh, Navneet; Sinha, Saurabh; Muirhead, William R; Marcus, Hani J

Williams, Simon C; Starup-Hansen, Joachim; Funnell, Jonathan P; Hanrahan, John Gerrard; Valetopoulou, Alexandra; Singh, Navneet; Sinha, Saurabh; Muirhead, William R; Marcus, Hani J.

Afiliação

Williams SC; Department of Neurosurgery, St George's University Hospital, London, UK.
Starup-Hansen J; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
Funnell JP; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
Hanrahan JG; Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK.
Valetopoulou A; Department of Neurosurgery, St George's University Hospital, London, UK.
Singh N; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
Sinha S; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
Muirhead WR; Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK.
Marcus HJ; Department of Neurosurgery, Imperial College Healthcare NHS Trust, London, UK.

Br J Neurosurg ; : 1-10, 2024 Feb 02.

Article em En | MEDLINE | ID: mdl-38305239

ABSTRACT

ABSTRACT

PURPOSE:

This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field.

METHODS:

In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT.

RESULTS:

For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance.

CONCLUSIONS:

LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.

Palavras-chave

AI; Artificial intelligence; ChatGPT; healthcare; large language model; natural language processing; neurosurgery

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Br J Neurosurg Assunto da revista: NEUROCIRURGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google