Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy.

Cheong, Kai Xiong; Zhang, Chenxi; Tan, Tien-En; Fenner, Beau J; Wong, Wendy Meihua; Teo, Kelvin Yc; Wang, Ya Xing; Sivaprasad, Sobha; Keane, Pearse A; Lee, Cecilia Sungmin; Lee, Aaron Y; Cheung, Chui Ming Gemmy; Wong, Tien Yin; Cheong, Yun-Gyung; Song, Su Jeong; Tham, Yih Chung

Cheong, Kai Xiong; Zhang, Chenxi; Tan, Tien-En; Fenner, Beau J; Wong, Wendy Meihua; Teo, Kelvin Yc; Wang, Ya Xing; Sivaprasad, Sobha; Keane, Pearse A; Lee, Cecilia Sungmin; Lee, Aaron Y; Cheung, Chui Ming Gemmy; Wong, Tien Yin; Cheong, Yun-Gyung; Song, Su Jeong; Tham, Yih Chung.

Afiliação

Cheong KX; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
Zhang C; Chinese Academy of Medical Sciences & Peking Union Medical College Hospital, Beijing, China.
Tan TE; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
Fenner BJ; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
Wong WM; Ophthalmology & Visual Sciences Academic Clinical Program (Eye-ACP), Duke-NUS Medical School, Singapore.
Teo KY; Department of Ophthalmology, National University Hospital, Singapore.
Wang YX; Centre for Innovation and Precision Eye Health; and Department of Ophthalmology, National University of Singapore Yong Loo Lin School of Medicine, Singapore.
Sivaprasad S; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
Keane PA; Ophthalmology & Visual Sciences Academic Clinical Program (Eye-ACP), Duke-NUS Medical School, Singapore.
Lee CS; Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital University of Medical Science, Beijing, China.
Lee AY; Moorfields Eye Hospital NHS Foundation Trust, London, UK.
Cheung CMG; Medical Retina, Moorfields Eye Hospital NHS Foundation Trust, London, UK.
Wong TY; Department of Ophthalmology, University of Washington, Seattle, Washington, USA.
Cheong YG; Department of Ophthalmology, University of Washington, Seattle, Washington, USA.
Song SJ; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
Tham YC; Ophthalmology & Visual Sciences Academic Clinical Program (Eye-ACP), Duke-NUS Medical School, Singapore.

Br J Ophthalmol ; 2024 May 15.

Article em En | MEDLINE | ID: mdl-38749531

ABSTRACT

ABSTRACT

BACKGROUND/

AIMS:

To compare the performance of generative versus retrieval-based chatbots in answering patient inquiries regarding age-related macular degeneration (AMD) and diabetic retinopathy (DR).

METHODS:

We evaluated four chatbots generative models (ChatGPT-4, ChatGPT-3.5 and Google Bard) and a retrieval-based model (OcularBERT) in a cross-sectional study. Their response accuracy to 45 questions (15 AMD, 15 DR and 15 others) was evaluated and compared. Three masked retinal specialists graded the responses using a three-point Likert scale either 2 (good, error-free), 1 (borderline) or 0 (poor with significant inaccuracies). The scores were aggregated, ranging from 0 to 6. Based on majority consensus among the graders, the responses were also classified as 'Good', 'Borderline' or 'Poor' quality.

RESULTS:

Overall, ChatGPT-4 and ChatGPT-3.5 outperformed the other chatbots, both achieving median scores (IQR) of 6 (1), compared with 4.5 (2) in Google Bard, and 2 (1) in OcularBERT (all p ≤8.4×10-3). Based on the consensus approach, 83.3% of ChatGPT-4's responses and 86.7% of ChatGPT-3.5's were rated as 'Good', surpassing Google Bard (50%) and OcularBERT (10%) (all p ≤1.4×10-2). ChatGPT-4 and ChatGPT-3.5 had no 'Poor' rated responses. Google Bard produced 6.7% Poor responses, and OcularBERT produced 20%. Across question types, ChatGPT-4 outperformed Google Bard only for AMD, and ChatGPT-3.5 outperformed Google Bard for DR and others.

CONCLUSION:

ChatGPT-4 and ChatGPT-3.5 demonstrated superior performance, followed by Google Bard and OcularBERT. Generative chatbots are potentially capable of answering domain-specific questions outside their original training. Further validation studies are still required prior to real-world implementation.

Palavras-chave

Macula; Public health; Retina

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links