Your browser doesn't support javascript.
loading
Assessing the Quality and Reliability of AI-Generated Responses to Common Hypertension Queries.
Vinufrancis, Aleena; Al Hussein, Hussein; Patel, Heena V; Nizami, Afshan; Singh, Aditya; Nunez, Bianca; Abdel-Aal, Aiah Mounir.
Afiliação
  • Vinufrancis A; Internal Medicine, Apollo Hospitals, Thrissur, IND.
  • Al Hussein H; Internal Medicine, Hamad Medical Corporation, Doha, QAT.
  • Patel HV; Internal Medicine, Gujarat Cancer Society (GCS) Medical College, Hospital, and Research Center, Ahmedabad, IND.
  • Nizami A; Medicine and Surgery, Appollo Medical College, Hyderabad, IND.
  • Singh A; Cardiology, Bhartiya Vidyapreet Medical College and Hospital, Sangli, IND.
  • Nunez B; Internal Medicine, Universidad autonoma e Guadalajara, Guadalajara , MEX.
  • Abdel-Aal AM; Pediatrics, Facult of medicine , University of Alexandria, Egypt, EGY.
Cureus ; 16(8): e66041, 2024 Aug.
Article em En | MEDLINE | ID: mdl-39224724
ABSTRACT

INTRODUCTION:

The integration of artificial intelligence (AI) in healthcare, particularly through language models like ChatGPT and ChatSonic, has gained substantial attention. This article explores the utilization of these AI models to address patient queries related to hypertension, emphasizing their potential to enhance health literacy and disease understanding. The study aims to compare the quality and reliability of responses generated by ChatGPT and ChatSonic in addressing common patient queries about hypertension and evaluate these AI models using the Global Quality Scale (GQS) and the Modified DISCERN scale.

METHODS:

A virtual cross-sectional observational study was conducted over one month, starting in October 2023. Ten common patient queries regarding hypertension were presented to ChatGPT (https//chat.openai.com/) and ChatSonic (https//writesonic.com/chat), and the responses were recorded. Two internal medicine physicians assessed the responses using the GQS and the Modified DISCERN scale. Statistical analysis included Cohen's Kappa values for inter-rater agreement.

RESULTS:

The study evaluated responses from ChatGPT and ChatSonic for 10 patient queries. Assessors observed variations in the quality and reliability assessments between the two AI models. Cohen's Kappa values indicated minimal agreement between the evaluators for both the GQS and Modified DISCERN scale.

CONCLUSIONS:

This study highlights the variations in the assessment of responses generated by ChatGPT and ChatSonic for hypertension-related queries. The findings highlight the need for ongoing monitoring and fact-checking of AI-generated responses.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article