Búsqueda | Portal Regional de la BVS

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Ömür Arça, Dilek; Erdemir, Ismail; Kara, Fevzi; Shermatov, Nurgazy; Odacioglu, Mürüvvet; Ibisoglu, Emel; Hanci, Ferid Baran; Sagiroglu, Gönül; Hanci, Volkan.

Medicine (Baltimore) ; 103(22): e38352, 2024 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-39259094

RESUMEN

This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (Pâ<â.001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.

Asunto(s)

Inteligencia Artificial , Reanimación Cardiopulmonar , Comprensión , Humanos , Estudios Transversales , Reanimación Cardiopulmonar/métodos , Reanimación Cardiopulmonar/normas , Reproducibilidad de los Resultados , Encuestas y Cuestionarios

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Hanci, Volkan; Ergün, Bisar; Gül, Sanser; Uzun, Özcan; Erdemir, Ismail; Hanci, Ferid Baran.

Medicine (Baltimore) ; 103(33): e39305, 2024 Aug 16.

Artículo en Inglés | MEDLINE | ID: mdl-39151545

RESUMEN

There is no study that comprehensively evaluates data on the readability and quality of "palliative care" information provided by artificial intelligence (AI) chatbots ChatGPT®, Bard®, Gemini®, Copilot®, Perplexity®. Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT®, Bard®, Gemini®, Copilot®, and Perplexity® were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (Pâ<â.05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, from easy to difficult (Pâ<â.05). In our study, the median readability indexes of each of the 5 AI chatbots Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini® responses were compared to the "recommended" 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (Pâ<â.001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity® (Pâ<â.001). Gemini® responses were found to have the highest Global Quality Scale score (Pâ<â.001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.

Asunto(s)

Inteligencia Artificial , Comprensión , Cuidados Paliativos , Humanos , Cuidados Paliativos/normas , Estudios Transversales , Reproducibilidad de los Resultados , Femenino , Masculino

Assessing parental comprehension of online resources on childhood pain.

Ocmen, Elvan; Erdemir, Ismail; Aksu Erdost, Hale; Hanci, Volkan.

Medicine (Baltimore) ; 103(25): e38569, 2024 Jun 21.

Artículo en Inglés | MEDLINE | ID: mdl-38905405

RESUMEN

We aimed to examine the patient education materials (PEMs) on the internet about "Child Pain" in terms of readability, reliability, quality and content. For our observational study, a search was made on February 28, 2024, using the keywords "Child Pain," "Pediatric Pain," and "Children Pain" in the Google search engine. The readability of PEMs was assessed using computer-based readability formulas (Flesch Reading Ease Score [FRES], Flesch-Kincaid Grade Level [FKGL], Automated readability index (ARI), Gunning Fog [GFOG], Coleman-Liau score [CL], Linsear Write [LW], Simple Measure of Gobbledygook [SMOG]). The reliability and quality of websites were determined using the Journal of American Medical Association (JAMA) score, Global Quality Score (GQS), and DISCERN score. 96 PEM websites included in our study. We determined that the FRES was 64 (32-84), the FKGL was 8.24 (4.01-15.19), ARI was 8.95 (4.67-17.38), GFOG was 11 (7.1-19.2), CL was 10.1 (6.95-15.64), LW was 8.08 (3.94-19.0) and SMOG was 8.1 (4.98-13.93). The scores of readability formulas showed that, the readability level of PEMs was statistically higher than sixth-grade level with all formulas (Pâ=â.011 for FRES, Pâ<â.001 for GFOG, Pâ<â.001 for ARI, Pâ<â.001 for FKGL, Pâ<â.001 for CL and Pâ<â.001 for SMOG), except LW formula (Pâ=â.112). The websites had moderate-to-low reliability and quality. Health-related websites had the highest quality with JAMA score. We found a weak negative correlation between Blexb score and JAMA score (Pâ=â.013). Compared to the sixth-grade level recommended by the American Medical Association and the National Institutes of Health, the readability grade level of child pain-related internet-based PEMs is quite high. On the other hand, the reliability and quality of PEMs were determined as moderate-to-low. The low readability and quality of PEMs could cause an anxious parent and unnecessary hospital admissions. PEMs on issues threatening public health should be prepared with attention to the recommendations on readability.

Asunto(s)

Comprensión , Internet , Padres , Humanos , Niño , Padres/psicología , Alfabetización en Salud , Dolor , Educación del Paciente como Asunto/métodos , Reproducibilidad de los Resultados , Información de Salud al Consumidor/normas

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.

Gül, Sanser; Erdemir, Ismail; Hanci, Volkan; Aydogmus, Evren; Erkoç, Yavuz Selim.

Medicine (Baltimore) ; 103(18): e38009, 2024 May 03.

Artículo en Inglés | MEDLINE | ID: mdl-38701313

RESUMEN

Subdural hematoma is defined as blood collection in the subdural space between the dura mater and arachnoid. Subdural hematoma is a condition that neurosurgeons frequently encounter and has acute, subacute and chronic forms. The incidence in adults is reported to be 1.72-20.60/100.000 people annually. Our study aimed to evaluate the quality, reliability and readability of the answers to questions asked to ChatGPT, Bard, and perplexity about "Subdural Hematoma." In this observational and cross-sectional study, we asked ChatGPT, Bard, and perplexity to provide the 100 most frequently asked questions about "Subdural Hematoma" separately. Responses from both chatbots were analyzed separately for readability, quality, reliability and adequacy. When the median readability scores of ChatGPT, Bard, and perplexity answers were compared with the sixth-grade reading level, a statistically significant difference was observed in all formulas (Pâ<â.001). All 3 chatbot responses were found to be difficult to read. Bard responses were more readable than ChatGPT's (Pâ<â.001) and perplexity's (Pâ<â.001) responses for all scores evaluated. Although there were differences between the results of the evaluated calculators, perplexity's answers were determined to be more readable than ChatGPT's answers (Pâ<â.05). Bard answers were determined to have the best GQS scores (Pâ<â.001). Perplexity responses had the best Journal of American Medical Association and modified DISCERN scores (Pâ<â.001). ChatGPT, Bard, and perplexity's current capabilities are inadequate in terms of quality and readability of "Subdural Hematoma" related text content. The readability standard for patient education materials as determined by the American Medical Association, National Institutes of Health, and the United States Department of Health and Human Services is at or below grade 6. The readability levels of the responses of artificial intelligence applications such as ChatGPT, Bard, and perplexity are significantly higher than the recommended 6th grade level.

Asunto(s)

Inteligencia Artificial , Comprensión , Hematoma Subdural , Humanos , Estudios Transversales , Reproducibilidad de los Resultados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA