Appropriateness of ChatGPT in Answering Heart Failure Related Questions.

King, Ryan C; Samaan, Jamil S; Yeo, Yee Hui; Mody, Behram; Lombardo, Dawn M; Ghashghaei, Roxana

King, Ryan C; Samaan, Jamil S; Yeo, Yee Hui; Mody, Behram; Lombardo, Dawn M; Ghashghaei, Roxana.

King RC; Division of Cardiology, Department of Medicine, Irvine Medical Center, University of California, Orange, CA, USA. Electronic address: kingrc@hs.uci.edu.
Samaan JS; Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
Yeo YH; Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
Mody B; Division of Cardiology, Department of Medicine, Irvine Medical Center, University of California, Orange, CA, USA.
Lombardo DM; Division of Cardiology, Department of Medicine, Irvine Medical Center, University of California, Orange, CA, USA.
Ghashghaei R; Division of Cardiology, Department of Medicine, Irvine Medical Center, University of California, Orange, CA, USA.

Heart Lung Circ ; 2024 May 30.

Article en En | MEDLINE | ID: mdl-38821760

ABSTRACT

ABSTRACT

BACKGROUND:

Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.

METHOD:

A total of 107 frequently asked heart failure-related questions were included in 3 categories "basic knowledge" (49), "management" (41) and "other" (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.

RESULTS:

GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of "basic knowledge" and "management" (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as "some correct and incorrect" for GPT-3.5, while no "completely incorrect" responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the "management" category and "other" category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.

CONCLUSIONS:

GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.

Palabras clave

Artificial intelligence; ChatGPT; Equity; Health education; Heart failure

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links