Digital health tools in nephrology: A comparative analysis of AI and professional opinions via online polls.

Pham, Justin H; Thongprayoon, Charat; Suppadungsuk, Supawadee; Miao, Jing; Craici, Iasmina M; Cheungpasitporn, Wisit

Pham, Justin H; Thongprayoon, Charat; Suppadungsuk, Supawadee; Miao, Jing; Craici, Iasmina M; Cheungpasitporn, Wisit.

Afiliação

Pham JH; Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN, USA.
Thongprayoon C; Department of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, USA.
Suppadungsuk S; Department of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, USA.
Miao J; Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan, Thailand.
Craici IM; Department of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, USA.
Cheungpasitporn W; Department of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, USA.

Digit Health ; 10: 20552076241277458, 2024.

Article em En | MEDLINE | ID: mdl-39221085

ABSTRACT

ABSTRACT

Background:

Professional opinion polling has become a popular means of seeking advice for complex nephrology questions in the #AskRenal community on X. ChatGPT is a large language model with remarkable problem-solving capabilities, but its ability to provide solutions for real-world clinical scenarios remains unproven. This study seeks to evaluate how closely ChatGPT's responses align with current prevailing medical opinions in nephrology.

Methods:

Nephrology polls from X were submitted to ChatGPT-4, which generated answers without prior knowledge of the poll outcomes. Its responses were compared to the poll results (inter-rater) and a second set of responses given after a one-week interval (intra-rater) using Cohen's kappa statistic (κ). Subgroup analysis was performed based on question subject matter.

Results:

Our analysis comprised two rounds of testing ChatGPT on 271 nephrology-related questions. In the first round, ChatGPT's responses agreed with poll results for 163 of the 271 questions (60.2%; κ = 0.42, 95% CI 0.38-0.46). In the second round, conducted to assess reproducibility, agreement improved slightly to 171 out of 271 questions (63.1%; κ = 0.46, 95% CI 0.42-0.50). Comparison of ChatGPT's responses between the two rounds demonstrated high internal consistency, with agreement in 245 out of 271 responses (90.4%; κ = 0.86, 95% CI 0.82-0.90). Subgroup analysis revealed stronger performance in the combined areas of homeostasis, nephrolithiasis, and pharmacology (κ = 0.53, 95% CI 0.47-0.59 in both rounds), compared to other nephrology subfields.

Conclusion:

ChatGPT-4 demonstrates modest capability in replicating prevailing professional opinion in nephrology polls overall, with varying performance levels between question topics and excellent internal consistency. This study provides insights into the potential and limitations of using ChatGPT in medical decision making.

Palavras-chave

ChatGPT; Large language model; artificial intelligence; digital health; disease management; kidney disease; nephrology; professional opinion polling; social media

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Digit Health Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google