Performance of large language models on advocating the management of meningitis: a comparative qualitative study.

Fisch, Urs; Kliem, Paulina; Grzonka, Pascale; Sutter, Raoul

Fisch, Urs; Kliem, Paulina; Grzonka, Pascale; Sutter, Raoul.

Affiliation

Fisch U; Department of Neurology, University Hospital Basel, Basel, Switzerland urs.fisch@usb.ch.
Kliem P; Clinic for Intensive Care Medicine, University Hospital Basel, Basel, Switzerland.
Grzonka P; Clinic for Intensive Care Medicine, University Hospital Basel, Basel, Switzerland.
Sutter R; Department of Neurology, University Hospital Basel, Basel, Switzerland.

BMJ Health Care Inform ; 31(1)2024 02 02.

Article de En | MEDLINE | ID: mdl-38307617

ABSTRACT

ABSTRACT

OBJECTIVES:

We aimed to examine the adherence of large language models (LLMs) to bacterial meningitis guidelines using a hypothetical medical case, highlighting their utility and limitations in healthcare.

METHODS:

A simulated clinical scenario of a patient with bacterial meningitis secondary to mastoiditis was presented in three independent sessions to seven publicly accessible LLMs (Bard, Bing, Claude-2, GTP-3.5, GTP-4, Llama, PaLM). Responses were evaluated for adherence to good clinical practice and two international meningitis guidelines.

RESULTS:

A central nervous system infection was identified in 90% of LLM sessions. All recommended imaging, while 81% suggested lumbar puncture. Blood cultures and specific mastoiditis work-up were proposed in only 62% and 38% sessions, respectively. Only 38% of sessions provided the correct empirical antibiotic treatment, while antiviral treatment and dexamethasone were advised in 33% and 24%, respectively. Misleading statements were generated in 52%. No significant correlation was found between LLMs' text length and performance (r=0.29, p=0.20). Among all LLMs, GTP-4 demonstrated the best performance.

DISCUSSION:

Latest LLMs provide valuable advice on differential diagnosis and diagnostic procedures but significantly vary in treatment-specific information for bacterial meningitis when introduced to a realistic clinical scenario. Misleading statements were common, with performance differences attributed to each LLM's unique algorithm rather than output length.

CONCLUSIONS:

Users must be aware of such limitations and performance variability when considering LLMs as a support tool for medical decision-making. Further research is needed to refine these models' comprehension of complex medical scenarios and their ability to provide reliable information.

Sujet(s)
Mots clés

Artificial intelligence; Decision Making, Computer-Assisted; Disease Management; Patient-Centered Care; Safety Management

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Méningite bactérienne / Mastoïdite Type d'étude: Guideline / Prognostic_studies / Qualitative_research Limites: Humans Langue: En Journal: BMJ Health Care Inform Année: 2024 Type de document: Article Pays d'affiliation: Suisse Pays de publication: Royaume-Uni

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google