Pesquisa | Secretaria de Estado da Saúde

Quantitative Comparison of Chatbots on Common Rhinology Pathologies.

Bellinger, Jeffrey R; Kwak, Minhie W; Ramos, Gabriel A; Mella, Jeffrey S; Mattos, Jose L.

Laryngoscope ; 2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38666768

RESUMO

OBJECTIVES: Understanding the strengths and weaknesses of chatbots as a source of patient information is critical for providers in the rising artificial intelligence landscape. This study is the first to quantitatively analyze and compare four of the most used chatbots available regarding treatments of common pathologies in rhinology. METHODS: The treatment of epistaxis, chronic sinusitis, sinus infection, allergic rhinitis, allergies, and nasal polyps was asked to chatbots ChatGPT, ChatGPT Plus, Google Bard, and Microsoft Bing in May 2023. Individual responses were analyzed by reviewers for readability, quality, understandability, and actionability using validated scoring metrics. Accuracy and comprehensiveness were evaluated for each response by two experts in rhinology. RESULTS: ChatGPT, Plus, Bard, and Bing had FRE readability scores of 33.17, 35.93, 46.50, and 46.32, respectively, indicating higher readability for Bard and Bing compared to ChatGPT (p = 0.003, p = 0.008) and Plus (p = 0.025, p = 0.048). ChatGPT, Plus, and Bard had mean DISCERN quality scores of 20.42, 20.89, and 20.61, respectively, which was higher than the score for Bing of 16.97 (p < 0.001). For understandability, ChatGPT and Bing had PEMAT scores of 76.67 and 66.61, respectively, which were lower than both Plus at 92.00 (p < 0.001, p < 0.001) and Bard at 92.67 (p < 0.001, p < 0.001). ChatGPT Plus had an accuracy score of 4.39 which was higher than ChatGPT (3.97, p = 0.118), Bard (3.72, p = 0.002), and Bing (3.19, p < 0.001). CONCLUSION: On aggregate of the tested domains, our results suggest ChatGPT Plus and Google Bard are currently the most patient-friendly chatbots for the treatment of common pathologies in rhinology. LEVEL OF EVIDENCE: N/A Laryngoscope, 2024.

BPPV Information on Google Versus AI (ChatGPT).

Bellinger, Jeffrey R; De La Chapa, Julian S; Kwak, Minhie W; Ramos, Gabriel A; Morrison, Daniel; Kesser, Bradley W.

Otolaryngol Head Neck Surg ; 2023 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-37622581

RESUMO

OBJECTIVE: To quantitatively compare online patient education materials found using traditional search engines (Google) versus conversational Artificial Intelligence (AI) models (ChatGPT) for benign paroxysmal positional vertigo (BPPV). STUDY DESIGN: The top 30 Google search results for "benign paroxysmal positional vertigo" were compared to the OpenAI conversational AI language model, ChatGPT, responses for 5 common patient questions posed about BPPV in February 2023. Metrics included readability, quality, understandability, and actionability. SETTING: Online information. METHODS: Validated online information metrics including Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE), DISCERN instrument score, and Patient Education Materials Assessment Tool for Printed Materials were analyzed and scored by reviewers. RESULTS: Mean readability scores, FKGL and FRE, for the Google webpages were 10.7 ± 2.6 and 46.5 ± 14.3, respectively. ChatGPT responses had a higher FKGL score of 13.9 ± 2.5 (P < .001) and a lower FRE score of 34.9 ± 11.2 (P = .005), both corresponding to lower readability. The Google webpages had a DISCERN part 2 score of 25.4 ± 7.5 compared to the individual ChatGPT responses with a score of 17.5 ± 3.9 (P = .001), and the combined ChatGPT responses with a score of 25.0 ± 0.9 (P = .928). The average scores of the reviewers for all ChatGPT responses for accuracy were 4.19 ± 0.82 and 4.31 ± 0.67 for currency. CONCLUSION: The results of this study suggest that the information on ChatGPT is more difficult to read, of lower quality, and more difficult to comprehend compared to information on Google searches.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa