ChatGPT vs. web search for patient questions: what does ChatGPT do better?

Shen, Sarek A; Perez-Heydrich, Carlos A; Xie, Deborah X; Nellis, Jason C

Shen, Sarek A; Perez-Heydrich, Carlos A; Xie, Deborah X; Nellis, Jason C.

Affiliation

Shen SA; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA. sarek.shen@gmail.com.
Perez-Heydrich CA; Johns Hopkins School of Medicine, Baltimore, MD, USA.
Xie DX; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA.
Nellis JC; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, 601 North Caroline Street, Baltimore, MD, 21287, USA.

Eur Arch Otorhinolaryngol ; 281(6): 3219-3225, 2024 Jun.

Article de En | MEDLINE | ID: mdl-38416195

ABSTRACT

ABSTRACT

PURPOSE:

Chat generative pretrained transformer (ChatGPT) has the potential to significantly impact how patients acquire medical information online. Here, we characterize the readability and appropriateness of ChatGPT responses to a range of patient questions compared to results from traditional web searches.

METHODS:

Patient questions related to the published Clinical Practice Guidelines by the American Academy of Otolaryngology-Head and Neck Surgery were sourced from existing online posts. Questions were categorized using a modified Rothwell classification system into (1) fact, (2) policy, and (3) diagnosis and recommendations. These were queried using ChatGPT and traditional web search. All results were evaluated on readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool). Accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale.

RESULTS:

54 questions were organized into fact (37.0%), policy (37.0%), and diagnosis (25.8%). The average readability for ChatGPT responses was lower than traditional web search (FRE 42.3 ± 13.1 vs. 55.6 ± 10.5, p < 0.001), while the PEMAT understandability was equivalent (93.8% vs. 93.5%, p = 0.17). ChatGPT scored higher than web search for questions the 'Diagnosis' category (p < 0.01); there was no difference in questions categorized as 'Fact' (p = 0.15) or 'Policy' (p = 0.22). Additional prompting improved ChatGPT response readability (FRE 55.6 ± 13.6, p < 0.01).

CONCLUSIONS:

ChatGPT outperforms web search in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policy. Appropriate prompting can further improve readability while maintaining accuracy. Further patient education is needed to relay the benefits and limitations of this technology as a source of medial information.

Sujet(s)
Mots clés

Accessibility; Accuracy; ChatGPT; Large language model; Patient education; Patient questions; Readability

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Intelligence artificielle / Compréhension Limites: Humans Langue: En Journal: Eur Arch Otorhinolaryngol Sujet du journal: OTORRINOLARINGOLOGIA Année: 2024 Type de document: Article Pays d'affiliation: États-Unis d'Amérique

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google