neuroGPT-X: toward a clinic-ready large language model.

Guo, Edward; Gupta, Mehul; Sinha, Sarthak; Rössler, Karl; Tatagiba, Marcos; Akagami, Ryojo; Al-Mefty, Ossama; Sugiyama, Taku; Stieg, Philip E; Pickett, Gwynedd E; de Lotbiniere-Bassett, Madeleine; Singh, Rahul; Lama, Sanju; Sutherland, Garnette R

Guo, Edward; Gupta, Mehul; Sinha, Sarthak; Rössler, Karl; Tatagiba, Marcos; Akagami, Ryojo; Al-Mefty, Ossama; Sugiyama, Taku; Stieg, Philip E; Pickett, Gwynedd E; de Lotbiniere-Bassett, Madeleine; Singh, Rahul; Lama, Sanju; Sutherland, Garnette R.

Affiliation

Guo E; 1Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.
Gupta M; 2Department of Clinical Neurosciences, Project neuroArm, Hotchkiss Brain Institute University of Calgary, Calgary, Alberta, Canada.
Sinha S; 1Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.
Rössler K; 1Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.
Tatagiba M; 3Department of Neurosurgery, Medical University of Vienna, Vienna, Austria.
Akagami R; 4Department of Neurosurgery, Tubingen University, Tubingen, Germany.
Al-Mefty O; 5Department of Surgery, University of British Columbia, Vancouver, British Columbia, Canada.
Sugiyama T; 6Department of Neurosurgery, Harvard School of Medicine, Boston, Massachusetts.
Stieg PE; 7Department of Neurosurgery, Hokkaido University Graduate School of Medicine, Sapporo, Japan.
Pickett GE; 8Department of Neurosurgery, Weill Cornell Medicine/NewYork-Presbyterian Hospital, New York, New York; and.
de Lotbiniere-Bassett M; 9Department of Surgery, Dalhousie University, Halifax, Nova Scotia, Canada.
Singh R; 1Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.
Lama S; 2Department of Clinical Neurosciences, Project neuroArm, Hotchkiss Brain Institute University of Calgary, Calgary, Alberta, Canada.
Sutherland GR; 1Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.

J Neurosurg ; 140(4): 1041-1053, 2024 Apr 01.

Article in En | MEDLINE | ID: mdl-38564804

ABSTRACT

ABSTRACT

OBJECTIVE:

The objective was to assess the performance of a context-enriched large language model (LLM) compared with international neurosurgical experts on questions related to the management of vestibular schwannoma. Furthermore, another objective was to develop a chat-based platform incorporating in-text citations, references, and memory to enable accurate, relevant, and reliable information in real time.

METHODS:

The analysis involved 1) creating a data set through web scraping, 2) developing a chat-based platform called neuroGPT-X, 3) enlisting 8 expert neurosurgeons across international centers to independently create questions (n = 1) and to answer (n = 4) and evaluate responses (n = 3) while blinded, and 4) analyzing the evaluation results on the management of vestibular schwannoma. In the blinded phase, all answers were assessed for accuracy, coherence, relevance, thoroughness, speed, and overall rating. All experts were unblinded and provided their thoughts on the utility and limitations of the tool. In the unblinded phase, all neurosurgeons provided answers to a Likert scale survey and long-answer questions regarding the clinical utility, likelihood of use, and limitations of the tool. The tool was then evaluated on the basis of a set of 103 consensus statements on vestibular schwannoma care from the 8th Quadrennial International Conference on Vestibular Schwannoma.

RESULTS:

Responses from the naive and context-enriched Generative Pretrained Transformer (GPT) models were consistently rated not significantly different in terms of accuracy, coherence, relevance, thoroughness, and overall performance, and they were often rated significantly higher than expert responses. Both the naive and content-enriched GPT models provided faster responses to the standardized question set than expert neurosurgeon respondents (p < 0.01). The context-enriched GPT model agreed with 98 of the 103 (95%) consensus statements. Of interest, all expert surgeons expressed concerns about the reliability of GPT in accurately addressing the nuances and controversies surrounding the management of vestibular schwannoma. Furthermore, the authors developed neuroGPT-X, a chat-based platform designed to provide point-of-care clinical support and mitigate the limitations of human memory. neuroGPT-X incorporates features such as in-text citations and references to enable accurate, relevant, and reliable information in real time.

CONCLUSIONS:

The present study, with its subspecialist-level performance in generating written responses to complex neurosurgical problems for which evidence-based consensus for management is lacking, suggests that context-enriched LLMs show promise as a point-of-care medical resource. The authors anticipate that this work will be a springboard for expansion into more medical specialties, incorporating evidence-based clinical information and developing expert-level dialogue surrounding LLMs in healthcare.

Subject(s)
Key words

GPT; acoustic schwannoma; large language models; neuroGPT-X; vestibular schwannoma

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neuroma, Acoustic / Medicine Limits: Humans Language: En Journal: J Neurosurg Year: 2024 Document type: Article Affiliation country: Country of publication:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google