Evaluating insomnia queries from an artificial intelligence chatbot for patient education.

Alapati, Rahul; Campbell, Daniel; Molin, Nicole; Creighton, Erin; Wei, Zhikui; Boon, Maurits; Huntley, Colin

Alapati, Rahul; Campbell, Daniel; Molin, Nicole; Creighton, Erin; Wei, Zhikui; Boon, Maurits; Huntley, Colin.

Affiliation

Alapati R; Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania.
Campbell D; Department of Otolaryngology, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.
Molin N; Department of Otolaryngology, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.
Creighton E; Department of Neurology, Jefferson Sleep Disorders Center, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.
Wei Z; Department of Otolaryngology, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.
Boon M; Department of Neurology, Jefferson Sleep Disorders Center, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.
Huntley C; Department of Neurology, Jefferson Sleep Disorders Center, Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania.

J Clin Sleep Med ; 20(4): 583-594, 2024 Apr 01.

Article in En | MEDLINE | ID: mdl-38217478

ABSTRACT

ABSTRACT

STUDY

OBJECTIVES:

We evaluated the accuracy of ChatGPT in addressing insomnia-related queries for patient education and assessed ChatGPT's ability to provide varied responses based on differing prompting scenarios.

METHODS:

Four identical sets of 20 insomnia-related queries were posed to ChatGPT. Each set differed by the context in which ChatGPT was prompted no prompt, patient-centered, physician-centered, and with references and statistics. Responses were reviewed by 2 academic sleep surgeons, 1 academic sleep medicine physician, and 2 sleep medicine fellows across 4 domains clinical accuracy, prompt adherence, referencing, and statistical precision, using a binary grading system. Flesch-Kincaid grade-level scores were calculated to estimate the grade level of the responses, with statistical differences between prompts analyzed via analysis of variance and Tukey's test. Interrater reliability was calculated using Fleiss's kappa.

RESULTS:

The study revealed significant variations in the Flesch-Kincaid grade-level scores across 4 prompts unprompted (13.2 ± 2.2), patient-centered (8.1 ± 1.9), physician-centered (15.4 ± 2.8), and with references and statistics (17.3 ± 2.3, P < .001). Despite poor Fleiss kappa scores, indicating low interrater reliability for clinical accuracy and relevance, all evaluators agreed that the majority of ChatGPT's responses were clinically accurate, with the highest variability on Form 4. The responses were also uniformly relevant to the given prompts (100% agreement). Eighty percent of the references ChatGPT cited were verified as both real and relevant, and only 25% of cited statistics were corroborated within referenced articles.

CONCLUSIONS:

ChatGPT can be used to generate clinically accurate responses to insomnia-related inquiries. CITATION Alapati R, Campbell D, Molin N, et al. Evaluating insomnia queries from an artificial intelligence chatbot for patient education. J Clin Sleep Med. 2024;20(4)583-594.

Subject(s)
Key words

ChatGPT; artificial intelligence; insomnia

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Artificial Intelligence / Sleep Initiation and Maintenance Disorders Limits: Humans Language: En Journal: J Clin Sleep Med Year: 2024 Document type: Article Publication country: EEUU / ESTADOS UNIDOS / ESTADOS UNIDOS DA AMERICA / EUA / UNITED STATES / UNITED STATES OF AMERICA / US / USA

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google