Evaluating ChatGPT-4's performance as a digital health advisor for otosclerosis surgery.

Sahin, Samil; Erkmen, Burak; Duymaz, Yasar Kemal; Bayram, Furkan; Tekin, Ahmet Mahmut; Topsakal, Vedat

Sahin, Samil; Erkmen, Burak; Duymaz, Yasar Kemal; Bayram, Furkan; Tekin, Ahmet Mahmut; Topsakal, Vedat.

Afiliação

Sahin S; Private Practitioner, Istanbul, Türkiye.
Erkmen B; Private Practitioner, Istanbul, Türkiye.
Duymaz YK; Umraniye Research and Training Hospital, University of Health Sciences, Istanbul, Türkiye.
Bayram F; Umraniye Research and Training Hospital, University of Health Sciences, Istanbul, Türkiye.
Tekin AM; Department of Otolaryngology and Head & Neck Surgery, Vrije Universiteit Brussel, Brussels Health Care Center, Brussels, Belgium.
Topsakal V; Department of Otolaryngology and Head & Neck Surgery, Vrije Universiteit Brussel, Brussels Health Care Center, Brussels, Belgium.

Front Surg ; 11: 1373843, 2024.

Article em En | MEDLINE | ID: mdl-38903865

ABSTRACT

ABSTRACT

Purpose:

This study aims to evaluate the effectiveness of ChatGPT-4, an artificial intelligence (AI) chatbot, in providing accurate and comprehensible information to patients regarding otosclerosis surgery.

Methods:

On October 20, 2023, 15 hypothetical questions were posed to ChatGPT-4 to simulate physician-patient interactions about otosclerosis surgery. Responses were evaluated by three independent ENT specialists using the DISCERN scoring system. The readability was evaluated using multiple indices Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (Gunning FOG), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), and Automated Readability Index (ARI).

Results:

The responses from ChatGPT-4 received DISCERN scores ranging from poor to excellent, with an overall score of 50.7 ± 8.2. The readability analysis indicated that the texts were above the 6th-grade level, suggesting they may not be easily comprehensible to the average reader. There was a significant positive correlation between the referees' scores. Despite providing correct information in over 90% of the cases, the study highlights concerns regarding the potential for incomplete or misleading answers and the high readability level of the responses.

Conclusion:

While ChatGPT-4 shows potential in delivering health information accurately, its utility is limited by the level of readability of its responses. The study underscores the need for continuous improvement in AI systems to ensure the delivery of information that is both accurate and accessible to patients with varying levels of health literacy. Healthcare professionals should supervise the use of such technologies to enhance patient education and care.

Palavras-chave

AI-based patient education; ChatGPT in healthcare; DISCERN scoring in AI responses; otosclerosis surgery information; readability of health information

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google