Search | Nursing VHL Search Portal

1.

A Turing test of whether AI chatbots are behaviorally similar to humans.

Mei, Qiaozhu; Xie, Yutong; Yuan, Walter; Jackson, Matthew O.

Proc Natl Acad Sci U S A ; 121(9): e2313925121, 2024 Feb 27.

Article in English | MEDLINE | ID: mdl-38386710

ABSTRACT

We administer a Turing test to AI chatbots. We examine how chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, etc., as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statistically indistinguishable from a random human from tens of thousands of human subjects from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts "as if" they were learning from the interactions and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoffs.

Subject(s)

Artificial Intelligence , Behavior , Humans , Altruism , Trust

2.

Black Box Warning: Large Language Models and the Future of Infectious Diseases Consultation.

Schwartz, Ilan S; Link, Katherine E; Daneshjou, Roxana; Cortés-Penfield, Nicolás.

Clin Infect Dis ; 78(4): 860-866, 2024 Apr 10.

Article in English | MEDLINE | ID: mdl-37971399

ABSTRACT

Large language models (LLMs) are artificial intelligence systems trained by deep learning algorithms to process natural language and generate text responses to user prompts. Some approach physician performance on a range of medical challenges, leading some proponents to advocate for their potential use in clinical consultation and prompting some consternation about the future of cognitive specialties. However, LLMs currently have limitations that preclude safe clinical deployment in performing specialist consultations, including frequent confabulations, lack of contextual awareness crucial for nuanced diagnostic and treatment plans, inscrutable and unexplainable training data and methods, and propensity to recapitulate biases. Nonetheless, considering the rapid improvement in this technology, growing calls for clinical integration, and healthcare systems that chronically undervalue cognitive specialties, it is critical that infectious diseases clinicians engage with LLMs to enable informed advocacy for how they should-and shouldn't-be used to augment specialist care.

Subject(s)

Communicable Diseases , Drug Labeling , Humans , Artificial Intelligence , Communicable Diseases/diagnosis , Language , Referral and Consultation

3.

Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution.

Park, Ko Un; Lipsitz, Stuart; Dominici, Laura S; Lynce, Filipa; Minami, Christina A; Nakhlis, Faina; Waks, Adrienne G; Warren, Laura E; Eidman, Nadine; Frazier, Jeannie; Hernandez, Lourdes; Leslie, Carla; Rafte, Susan; Stroud, Delia; Weissman, Joel S; King, Tari A; Mittendorf, Elizabeth A.

Cancer ; 2024 Aug 30.

Article in English | MEDLINE | ID: mdl-39211977

ABSTRACT

BACKGROUND: This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients. METHODS: Twenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4-point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5-point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified. RESULTS: The overall average accuracy was 1.88 (range 1.0-3.0; 95% confidence interval [CI], 1.42-1.94), and clinical concordance was 2.79 (range 1.0-5.0; 95% CI, 1.94-3.64). The average word count was 310 words per response (range, 146-441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59-0.91; p < .001). The average readability was poor at 37.9 (range, 18.0-60.5) with high concordance (ICC, 0.73; 95% CI, 0.57-0.90; p < .001). There was a weak correlation between ease of readability and better clinical concordance (-0.15; p = .025). Accuracy did not correlate with readability (0.05; p = .079). The average number of references was 1.97 (range, 1-4; total, 119). ChatGPT cited peer-reviewed articles only once and often referenced nonexistent websites (41%). CONCLUSIONS: Because ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.

4.

ChatGPT4's proficiency in addressing patients' questions on systemic lupus erythematosus: a blinded comparative study with specialists.

Xu, Dan; Zhao, Jinxia; Liu, Rui; Dai, Yijun; Sun, Kai; Wong, Priscilla; Ming, Samuel Lee Shang; Wearn, Koh Li; Wang, Jiangyuan; Xie, Shasha; Zeng, Lin; Mu, Rong; Xu, Chuanhui.

Rheumatology (Oxford) ; 63(9): 2450-2456, 2024 Sep 01.

Article in English | MEDLINE | ID: mdl-38648756

ABSTRACT

OBJECTIVES: The efficacy of artificial intelligence (AI)-driven chatbots like ChatGPT4 in specialized medical consultations, particularly in rheumatology, remains underexplored. This study compares the proficiency of ChatGPT4' responses with practicing rheumatologists to inquiries from patients with SLE. METHODS: In this cross-sectional study, we curated 95 frequently asked questions (FAQs), including 55 in Chinese and 40 in English. Responses for FAQs from ChatGPT4 and five rheumatologists were scored separately by a panel of rheumatologists and a group of patients with SLE across six domains (scientific validity, logical consistency, comprehensibility, completeness, satisfaction level and empathy) on a 0-10 scale (a score of 0 indicates entirely incorrect responses, while 10 indicates accurate and comprehensive answers). RESULTS: Rheumatologists' scoring revealed that ChatGPT4-generated responses outperformed those from rheumatologists in satisfaction level and empathy, with mean differences of 0.537 (95% CI, 0.252-0.823; P < 0.01) and 0.460 (95% CI, 0.227-0.693; P < 0.01), respectively. From the SLE patients' perspective, ChatGPT4-generated responses were comparable to the rheumatologist-provided answers in all six domains. Subgroup analysis revealed ChatGPT4 responses were more logically consistent and complete regardless of language and exhibited greater comprehensibility, satisfaction and empathy in Chinese. However, ChatGPT4 responses were inferior in comprehensibility for English FAQs. CONCLUSION: ChatGPT4 demonstrated comparable, possibly better in certain domains, to address FAQs from patients with SLE, when compared with the answers provided by specialists. This study showed the potential of applying ChatGPT4 to improve consultation in SLE patients.

Subject(s)

Lupus Erythematosus, Systemic , Humans , Lupus Erythematosus, Systemic/psychology , Cross-Sectional Studies , Female , Male , Physician-Patient Relations , Artificial Intelligence , Adult , Rheumatologists/psychology , Rheumatology/standards , Surveys and Questionnaires , Middle Aged , Patient Satisfaction

5.

Bridging bytes and biopsies: A comparative analysis of ChatGPT and histopathologists in pathology diagnosis and collaborative potential.

Oon, Ming Liang; Syn, Nicholas L; Tan, Char Loo; Tan, Kong-Bing; Ng, Siok-Bian.

Histopathology ; 84(4): 601-613, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38032062

ABSTRACT

BACKGROUND AND AIMS: ChatGPT is a powerful artificial intelligence (AI) chatbot developed by the OpenAI research laboratory which is capable of analysing human input and generating human-like responses. Early research into the potential application of ChatGPT in healthcare has focused mainly on clinical and administrative functions. The diagnostic ability and utility of ChatGPT in histopathology is not well defined. We benchmarked the performance of ChatGPT against pathologists in diagnostic histopathology, and evaluated the collaborative potential between pathologists and ChatGPT to deliver more accurate diagnoses. METHODS AND RESULTS: In Part 1 of the study, pathologists and ChatGPT were subjected to a series of questions encompassing common diagnostic conundrums in histopathology. For Part 2, pathologists reviewed a series of challenging virtual slides and provided their diagnoses before and after consultation with ChatGPT. We found that ChatGPT performed worse than pathologists in reaching the correct diagnosis. Consultation with ChatGPT provided limited help and information generated from ChatGPT is dependent on the prompts provided by the pathologists and is not always correct. Finally, we surveyed pathologists who rated the diagnostic accuracy of ChatGPT poorly, but found it useful as an advanced search engine. CONCLUSIONS: The use of ChatGPT4 as a diagnostic tool in histopathology is limited by its inherent shortcomings. Judicious evaluation of the information and histopathology diagnosis generated from ChatGPT4 is essential and cannot replace the acuity and judgement of a pathologist. However, future advances in generative AI may expand its role in the field of histopathology.

Subject(s)

Artificial Intelligence , Pathologists , Humans , Biopsy , Referral and Consultation , Software

6.

Clinical decision-making in benzodiazepine deprescribing by healthcare providers vs. AI-assisted approach.

Buzancic, Iva; Belec, Dora; Drzaic, Margita; Kummer, Ingrid; Brkic, Jovana; Fialová, Daniela; Ortner Hadziabdic, Maja.

Br J Clin Pharmacol ; 90(3): 662-674, 2024 03.

Article in English | MEDLINE | ID: mdl-37949663

ABSTRACT

AIMS: The aim of this study was to compare the clinical decision-making for benzodiazepine deprescribing between a healthcare provider (HCP) and an artificial intelligence (AI) chatbot GPT4 (ChatGPT-4). METHODS: We analysed real-world data from a Croatian cohort of community-dwelling benzodiazepine patients (n = 154) within the EuroAgeism H2020 ESR 7 project. HCPs evaluated the data using pre-established deprescribing criteria to assess benzodiazepine discontinuation potential. The research team devised and tested AI prompts to ensure consistency with HCP judgements. An independent researcher employed ChatGPT-4 with predetermined prompts to simulate clinical decisions for each patient case. Data derived from human-HCP and ChatGPT-4 decisions were compared for agreement rates and Cohen's kappa. RESULTS: Both HPC and ChatGPT identified patients for benzodiazepine deprescribing (96.1% and 89.6%, respectively), showing an agreement rate of 95% (κ = .200, P = .012). Agreement on four deprescribing criteria ranged from 74.7% to 91.3% (lack of indication κ = .352, P < .001; prolonged use κ = .088, P = .280; safety concerns κ = .123, P = .006; incorrect dosage κ = .264, P = .001). Important limitations of GPT-4 responses were identified, including 22.1% ambiguous outputs, generic answers and inaccuracies, posing inappropriate decision-making risks. CONCLUSIONS: While AI-HCP agreement is substantial, sole AI reliance poses a risk for unsuitable clinical decision-making. This study's findings reveal both strengths and areas for enhancement of ChatGPT-4 in the deprescribing recommendations within a real-world sample. Our study underscores the need for additional research on chatbot functionality in patient therapy decision-making, further fostering the advancement of AI for optimal performance.

Subject(s)

Artificial Intelligence , Deprescriptions , Humans , Benzodiazepines/adverse effects , Clinical Decision-Making , Health Personnel

7.

Performance of artificial intelligence chatbots in interpreting clinical images of pressure injuries.

Shiraishi, Makoto; Kanayama, Koji; Kurita, Daichi; Moriwaki, Yuta; Okazaki, Mutsumi.

Wound Repair Regen ; 2024 May 15.

Article in English | MEDLINE | ID: mdl-38747443

ABSTRACT

To evaluate the accuracy of AI chatbots in staging pressure injuries according to the National Pressure Injury Advisory Panel (NPIAP) Staging through clinical image interpretation, a cross-sectional design was conducted to assess five leading publicly available AI chatbots. As a result, three chatbots were unable to interpret the clinical images, whereas GPT-4 Turbo achieved a high accuracy rate (83.0%) in staging pressure injuries, notably outperforming BingAI Creative mode (24.0%) with statistical significance (p < 0.001). GPT-4 Turbo accurately identified Stages 1 (p < 0.001), 3 (p = 0.001), and 4 (p < 0.001) pressure injuries, and suspected deep tissue injuries (p < 0.001), while BingAI demonstrated significantly lower accuracy across all stages. The findings highlight the potential of AI chatbots, especially GPT-4 Turbo, in accurately diagnosing images and aiding the subsequent management of pressure injuries.

8.

A necessary conversation to develop chatbots for HIV studies: qualitative findings from research staff, community advisory board members, and study participants.

Comulada, W Scott; Rezai, Roxana; Sumstine, Stephanie; Flores, Dalmacio Dennis; Kerin, Tara; Ocasio, Manuel A; Swendeman, Dallas; Fernández, M Isabel.

AIDS Care ; 36(4): 463-471, 2024 Apr.

Article in English | MEDLINE | ID: mdl-37253196

ABSTRACT

Chatbots increase business productivity by handling customer conversations instead of human agents. Similar rationale applies to use chatbots in the healthcare sector, especially for health coaches who converse with clients. Chatbots are nascent in healthcare. Study findings have been mixed in terms of engagement and their impact on outcomes. Questions remain as to chatbot acceptability with coaches and other providers; studies have focused on clients.To clarify perceived benefits of chatbots in HIV interventions we conducted virtual focus groups with 13 research staff, eight community advisory board members, and seven young adults who were HIV intervention trial participants (clients). Our HIV healthcare context is important. Clients represent a promising age demographic for chatbot uptake. They are a marginalized population warranting consideration to avoid technology that limits healthcare access.Focus group participants expressed the value of chatbots for HIV research staff and clients. Staff discussed how chatbot functions, such as automated appointment scheduling and service referrals, could reduce workloads while clients discussed the after-hours convenience of these functions. Participants also emphasized that chatbots should provide relatable conversation, reliable functionality, and would not be appropriate for all clients. Our findings underscore the need to further examine appropriate chatbot functionality in HIV interventions.

Subject(s)

HIV Infections , Young Adult , Humans , HIV Infections/prevention & control , Communication , Commerce , Focus Groups , Health Facilities

9.

Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum.

Meyer, Annika; Soleman, Ari; Riese, Janik; Streichert, Thomas.

Clin Chem Lab Med ; 2024 May 29.

Article in English | MEDLINE | ID: mdl-38804035

ABSTRACT

OBJECTIVES: Laboratory medical reports are often not intuitively comprehensible to non-medical professionals. Given their recent advancements, easier accessibility and remarkable performance on medical licensing exams, patients are therefore likely to turn to artificial intelligence-based chatbots to understand their laboratory results. However, empirical studies assessing the efficacy of these chatbots in responding to real-life patient queries regarding laboratory medicine are scarce. METHODS: Thus, this investigation included 100 patient inquiries from an online health forum, specifically addressing Complete Blood Count interpretation. The aim was to evaluate the proficiency of three artificial intelligence-based chatbots (ChatGPT, Gemini and Le Chat) against the online responses from certified physicians. RESULTS: The findings revealed that the chatbots' interpretations of laboratory results were inferior to those from online medical professionals. While the chatbots exhibited a higher degree of empathetic communication, they frequently produced erroneous or overly generalized responses to complex patient questions. The appropriateness of chatbot responses ranged from 51 to 64â¯%, with 22 to 33â¯% of responses overestimating patient conditions. A notable positive aspect was the chatbots' consistent inclusion of disclaimers regarding its non-medical nature and recommendations to seek professional medical advice. CONCLUSIONS: The chatbots' interpretations of laboratory results from real patient queries highlight a dangerous dichotomy - a perceived trustworthiness potentially obscuring factual inaccuracies. Given the growing inclination towards self-diagnosis using AI platforms, further research and improvement of these chatbots is imperative to increase patients' awareness and avoid future burdens on the healthcare system.

10.

Generative AI in Pediatric Gastroenterology.

Rosen, John M.

Curr Gastroenterol Rep ; 2024 Sep 07.

Article in English | MEDLINE | ID: mdl-39243338

ABSTRACT

PURPOSE OF REVIEW: The integration of digital technology into medical practice is often thrust upon clinicians, with standards and routines developed long after initiation. Clinicians should endeavor towards a basic understanding even of emerging technologies so that they can direct its use. The intent of this review is to describe the current state of rapidly evolving generative artificial intelligence (GAI), and to explore both how pediatric gastroenterology practice may benefit as well as challenges that will be faced. RECENT FINDINGS: Although little research demonstrating the acceptance, practice, and outcomes associated with GAI in pediatric gastroenterology is published, there are relevant data adjacent to the specialty and overwhelming potential as professed in the media. Best practice guidelines are widely developed in academic publishing and resources to initiate and improve practical user skills are prevalent. Initial published evidence supports broad acceptance of the technology as part of medical practice by clinicians and patients, describes methods with which higher quality GAI can be developed, and identifies the potential for bias and disparities resulting from its use. GAI is broadly available as a digital tool for incorporation into medical practice and holds promise for improved quality and efficiency of care, but investigation into how GAI can best be used remains at an early stage despite rapid evolution of the technology.

11.

Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.

Jedrzejczak, W Wiktor; Kochanek, Krzysztof.

Audiol Neurootol ; : 1-7, 2024 May 06.

Article in English | MEDLINE | ID: mdl-38710158

ABSTRACT

INTRODUCTION: The purpose of this study was to evaluate three chatbots - OpenAI ChatGPT, Microsoft Bing Chat (currently Copilot), and Google Bard (currently Gemini) - in terms of their responses to a defined set of audiological questions. METHODS: Each chatbot was presented with the same 10 questions. The authors rated the responses on a Likert scale ranging from 1 to 5. Additional features, such as the number of inaccuracies or errors and the provision of references, were also examined. RESULTS: Most responses given by all three chatbots were rated as satisfactory or better. However, all chatbots generated at least a few errors or inaccuracies. ChatGPT achieved the highest overall score, while Bard was the worst. Bard was also the only chatbot unable to provide a response to one of the questions. ChatGPT was the only chatbot that did not provide information about its sources. CONCLUSIONS: Chatbots are an intriguing tool that can be used to access basic information in a specialized area like audiology. Nevertheless, one needs to be careful, as correct information is not infrequently mixed in with errors that are hard to pick up unless the user is well versed in the field.

12.

Effects of Chatbot Components to Facilitate Mental Health Services Use in Individuals With Eating Disorders Following Online Screening: An Optimization Randomized Controlled Trial.

Fitzsimmons-Craft, Ellen E; Rackoff, Gavin N; Shah, Jillian; Strayhorn, Jillian C; D'Adamo, Laura; DePietro, Bianca; Howe, Carli P; Firebaugh, Marie-Laure; Newman, Michelle G; Collins, Linda M; Taylor, C Barr; Wilfley, Denise E.

Int J Eat Disord ; 2024 Jul 28.

Article in English | MEDLINE | ID: mdl-39072846

ABSTRACT

OBJECTIVE: Few individuals with eating disorders (EDs) receive treatment. Innovations are needed to identify individuals with EDs and address care barriers. We developed a chatbot for promoting services uptake that could be paired with online screening. However, it is not yet known which components drive effects. This study estimated individual and combined contributions of four chatbot components on mental health services use (primary), chatbot helpfulness, and attitudes toward changing eating/shape/weight concerns ("change attitudes," with higher scores indicating greater importance/readiness). METHODS: Two hundred five individuals screening with an ED but not in treatment were randomized in an optimization randomized controlled trial to receive up to four chatbot components: psychoeducation, motivational interviewing, personalized service recommendations, and repeated administration (follow-up check-ins/reminders). Assessments were at baseline and 2, 6, and 14 weeks. RESULTS: Participants who received repeated administration were more likely to report mental health services use, with no significant effects of other components on services use. Repeated administration slowed the decline in change attitudes participants experienced over time. Participants who received motivational interviewing found the chatbot more helpful, but this component was also associated with larger declines in change attitudes. Participants who received personalized recommendations found the chatbot more helpful, and receiving this component on its own was associated with the most favorable change attitude time trend. Psychoeducation showed no effects. DISCUSSION: Results indicated important effects of components on outcomes; findings will be used to finalize decision making about the optimized intervention package. The chatbot shows high potential for addressing the treatment gap for EDs.

13.

Demographic and clinical characteristics associated with anxiety and depressive symptom outcomes in users of a digital mental health intervention incorporating a relational agent.

Chiauzzi, Emil; Williams, Andre; Mariano, Timothy Y; Pajarito, Sarah; Robinson, Athena; Kirvin-Quamme, Andrew; Forman-Hoffman, Valerie.

BMC Psychiatry ; 24(1): 79, 2024 Jan 30.

Article in English | MEDLINE | ID: mdl-38291369

ABSTRACT

BACKGROUND: Digital mental health interventions (DMHIs) may reduce treatment access issues for those experiencing depressive and/or anxiety symptoms. DMHIs that incorporate relational agents may offer unique ways to engage and respond to users and to potentially help reduce provider burden. This study tested Woebot for Mood & Anxiety (W-MA-02), a DMHI that employs Woebot, a relational agent that incorporates elements of several evidence-based psychotherapies, among those with baseline clinical levels of depressive or anxiety symptoms. Changes in self-reported depressive and anxiety symptoms over 8 weeks were measured, along with the association between each of these outcomes and demographic and clinical characteristics. METHODS: This exploratory, single-arm, 8-week study of 256 adults yielded non-mutually exclusive subsamples with either clinical levels of depressive or anxiety symptoms at baseline. Week 8 Patient Health Questionnaire-8 (PHQ-8) changes were measured in the depressive subsample (PHQ-8 ≥ 10). Week 8 Generalized Anxiety Disorder-7 (GAD-7) changes were measured in the anxiety subsample (GAD-7 ≥ 10). Demographic and clinical characteristics were examined in association with symptom changes via bivariate and multiple regression models adjusted for W-MA-02 utilization. Characteristics included age, sex at birth, race/ethnicity, marital status, education, sexual orientation, employment status, health insurance, baseline levels of depressive and anxiety symptoms, and concurrent psychotherapeutic or psychotropic medication treatments during the study. RESULTS: Both the depressive and anxiety subsamples were predominantly female, educated, non-Hispanic white, and averaged 38 and 37 years of age, respectively. The depressive subsample had significant reductions in depressive symptoms at Week 8 (mean change =-7.28, SD = 5.91, Cohen's d = -1.23, p < 0.01); the anxiety subsample had significant reductions in anxiety symptoms at Week 8 (mean change = -7.45, SD = 5.99, Cohen's d = -1.24, p < 0.01). No significant associations were found between sex at birth, age, employment status, educational background and Week 8 symptom changes. Significant associations between depressive and anxiety symptom outcomes and sexual orientation, marital status, concurrent mental health treatment, and baseline symptom severity were found. CONCLUSIONS: The present study suggests early promise for W-MA-02 as an intervention for depression and/or anxiety symptoms. Although exploratory in nature, this study revealed potential user characteristics associated with outcomes that can be investigated in future studies. TRIAL REGISTRATION: This study was retrospectively registered on ClinicalTrials.gov (#NCT05672745) on January 5th, 2023.

Subject(s)

Depression , Mental Health , Adult , Infant, Newborn , Humans , Male , Female , Depression/therapy , Depression/psychology , Anxiety/therapy , Anxiety Disorders/therapy , Ethnicity , Psychotropic Drugs

14.

Potential of AI-Driven Chatbots in Urology: Revolutionizing Patient Care Through Artificial Intelligence.

Talyshinskii, Ali; Naik, Nithesh; Hameed, B M Zeeshan; Juliebø-Jones, Patrick; Somani, Bhaskar Kumar.

Curr Urol Rep ; 25(1): 9-18, 2024 Jan.

Article in English | MEDLINE | ID: mdl-37723300

ABSTRACT

PURPOSE OF REVIEW: Artificial intelligence (AI) chatbots have emerged as a potential tool to transform urology by improving patient care and physician efficiency. With an emphasis on their potential advantages and drawbacks, this literature review offers a thorough assessment of the state of AI-driven chatbots in urology today. RECENT FINDINGS: The capacity of AI-driven chatbots in urology to give patients individualized and timely medical advice is one of its key advantages. Chatbots can help patients prioritize their symptoms and give advice on the best course of treatment. By automating administrative duties and offering clinical decision support, chatbots can also help healthcare providers. Before chatbots are widely used in urology, there are a few issues that need to be resolved. The precision of chatbot diagnoses and recommendations might be impacted by technical constraints like system errors and flaws. Additionally, issues regarding the security and privacy of patient data must be resolved, and chatbots must adhere to all applicable laws. Important issues that must be addressed include accuracy and dependability because any mistakes or inaccuracies could seriously harm patients. The final obstacle is resistance from patients and healthcare professionals who are hesitant to use new technology or who value in-person encounters. AI-driven chatbots have the potential to significantly improve urology care and efficiency. However, it is essential to thoroughly test and ensure the accuracy of chatbots, address privacy and security concerns, and design user-friendly chatbots that can integrate into existing workflows. By exploring various scenarios and examining the current literature, this review provides an analysis of the prospects and limitations of implementing chatbots in urology.

Subject(s)

Physicians , Urology , Humans , Artificial Intelligence , Patient Care

15.

Performance of large language models (LLMs) in providing prostate cancer information.

Alasker, Ahmed; Alsalamah, Seham; Alshathri, Nada; Almansour, Nura; Alsalamah, Faris; Alghafees, Mohammad; AlKhamees, Mohammad; Alsaikhan, Bader.

BMC Urol ; 24(1): 177, 2024 Aug 23.

Article in English | MEDLINE | ID: mdl-39180045

ABSTRACT

PURPOSE: The diagnosis and management of prostate cancer (PCa), the second most common cancer in men worldwide, are highly complex. Hence, patients often seek knowledge through additional resources, including AI chatbots such as ChatGPT and Google Bard. This study aimed to evaluate the performance of LLMs in providing education on PCa. METHODS: Common patient questions about PCa were collected from reliable educational websites and evaluated for accuracy, comprehensiveness, readability, and stability by two independent board-certified urologists, with a third resolving discrepancy. Accuracy was measured on a 3-point scale, comprehensiveness was measured on a 5-point Likert scale, and readability was measured using the Flesch Reading Ease (FRE) score and Flesch-Kincaid FK Grade Level. RESULTS: A total of 52 questions on general knowledge, diagnosis, treatment, and prevention of PCa were provided to three LLMs. Although there was no significant difference in the overall accuracy of LLMs, ChatGPT-3.5 demonstrated superiority over the other LLMs in terms of general knowledge of PCa (p = 0.018). ChatGPT-4 achieved greater overall comprehensiveness than ChatGPT-3.5 and Bard (p = 0.028). For readability, Bard generated simpler sentences with the highest FRE score (54.7, p < 0.001) and lowest FK reading level (10.2, p < 0.001). CONCLUSION: ChatGPT-3.5, ChatGPT-4 and Bard generate accurate, comprehensive, and easily readable PCa material. These AI models might not replace healthcare professionals but can assist in patient education and guidance.

Subject(s)

Prostatic Neoplasms , Male , Humans , Patient Education as Topic/methods , Language , Comprehension

16.

Chatbot dialogic reading boosts comprehension for Chinese kindergarteners with higher language skills.

Cheng, Xi; Yin, Li; Lin, Chaochao; Shi, Zhaoning; Zheng, Hanxiao; Zhu, Leqi; Liu, Xiabi; Chen, Keran; Dong, Rui.

J Exp Child Psychol ; 240: 105842, 2024 04.

Article in English | MEDLINE | ID: mdl-38184956

ABSTRACT

Dialogic reading promotes early language and literacy development, but high-quality interactions may be inaccessible to disadvantaged children. This study examined whether a chatbot could deliver dialogic reading support comparable to a human partner for Chinese kindergarteners. Using a 2 × 2 factorial design, 148 children (83 girls; Mage = 70.07 months, SD = 7.64) from less resourced families in Beijing, China, were randomly assigned to one of four conditions: dialogic or non-dialogic reading techniques with either a chatbot or human partner. The chatbot provided comparable dialogic support to the human partner, enhancing story comprehension and word learning. Critically, the chatbot's effect on story comprehension was moderated by children's language proficiency rather than age or reading ability. This demonstrates that chatbots can facilitate dialogic reading and highlights the importance of considering children's language skills when implementing chatbot dialogic interventions.

Subject(s)

Comprehension , Reading , Child , Female , Humans , Child, Preschool , Vocabulary , Verbal Learning , China

17.

Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT.

Pasli, Sinan; Sahin, Abdul Samet; Beser, Muhammet Fatih; Topçuoglu, Hazal; Yadigaroglu, Metin; Imamoglu, Melih.

Am J Emerg Med ; 78: 170-175, 2024 04.

Article in English | MEDLINE | ID: mdl-38295466

ABSTRACT

BACKGROUND: The rise in emergency department presentations globally poses challenges for efficient patient management. To address this, various strategies aim to expedite patient management. Artificial intelligence's (AI) consistent performance and rapid data interpretation extend its healthcare applications, especially in emergencies. The introduction of a robust AI tool like ChatGPT, based on GPT-4 developed by OpenAI, can benefit patients and healthcare professionals by improving the speed and accuracy of resource allocation. This study examines ChatGPT's capability to predict triage outcomes based on local emergency department rules. METHODS: This study is a single-center prospective observational study. The study population consists of all patients who presented to the emergency department with any symptoms and agreed to participate. The study was conducted on three non-consecutive days for a total of 72 h. Patients' chief complaints, vital parameters, medical history and the area to which they were directed by the triage team in the emergency department were recorded. Concurrently, an emergency medicine physician inputted the same data into previously trained GPT-4, according to local rules. According to this data, the triage decisions made by GPT-4 were recorded. In the same process, an emergency medicine specialist determined where the patient should be directed based on the data collected, and this decision was considered the gold standard. Accuracy rates and reliability for directing patients to specific areas by the triage team and GPT-4 were evaluated using Cohen's kappa test. Furthermore, the accuracy of the patient triage process performed by the triage team and GPT-4 was assessed by receiver operating characteristic (ROC) analysis. Statistical analysis considered a value of p < 0.05 as significant. RESULTS: The study was carried out on 758 patients. Among the participants, 416 (54.9%) were male and 342 (45.1%) were female. Evaluating the primary endpoints of our study - the agreement between the decisions of the triage team, GPT-4 decisions in emergency department triage, and the gold standard - we observed almost perfect agreement both between the triage team and the gold standard and between GPT-4 and the gold standard (Cohen's Kappa 0.893 and 0.899, respectively; p < 0.001 for each). CONCLUSION: Our findings suggest GPT-4 possess outstanding predictive skills in triaging patients in an emergency setting. GPT-4 can serve as an effective tool to support the triage process.

Subject(s)

Emergency Medicine , Triage , Female , Humans , Male , Artificial Intelligence , Emergency Service, Hospital , Reproducibility of Results , Prospective Studies

18.

The acceptability and effectiveness of artificial intelligence-based chatbot for hypertensive patients in community: protocol for a mixed-methods study.

Chen, Ping; Li, Yi; Zhang, Xuxi; Feng, Xinglin; Sun, Xinying.

BMC Public Health ; 24(1): 2266, 2024 Aug 21.

Article in English | MEDLINE | ID: mdl-39169305

ABSTRACT

BACKGROUND: Chatbots can provide immediate assistance tailored to patients' needs, making them suitable for sustained accompanying interventions. Nevertheless, there is currently no evidence regarding their acceptability by hypertensive patients and the factors influencing the acceptability in the real-world. Existing evaluation scales often focus solely on the technology itself, overlooking the patients' perspective. Utilizing mixed methods can offer a more comprehensive exploration of influencing factors, laying the groundwork for the future integration of artificial intelligence in chronic disease management practices. METHODS: The mixed methods will provide a holistic view to understand the effectiveness and acceptability of the intervention. Participants will either receive the standard primary health care or obtain a chatbot speaker. The speaker can provide timely reminders, on-demand consultations, personalized data recording, knowledge broadcasts, as well as entertainment features such as telling jokes. The quantitative part will be conducted as a quasi-randomized controlled trial in community in Beijing. And the convergent design will be adopted. When patients use the speaker for 1 month, scales will be used to measure patients' intention to use the speaker. At the same time, semi-structured interviews will be conducted to explore patients' feelings and influencing factors of using speakers. Data on socio-demography, physical examination, blood pressure, acceptability and self-management behavior will be collected at baseline, as well as 1,3,6, and 12 months later. Furthermore, the cloud database will continuously collect patients' interactions with the speaker. The primary outcome is the efficacy of the chatbot on blood pressure control. The secondary outcome includes the acceptability of the chatbot speaker and the changes of self-management behavior. DISCUSSION: Artificial intelligence-based chatbot speaker not only caters to patients' self-management needs at home but also effectively organizes intricate and detailed knowledge system for patients with hypertension through a knowledge graph. Patients can promptly access information that aligns with their specific requirements, promoting proactive self-management and playing a crucial role in disease management. This study will serve as a foundation for the application of artificial intelligence technology in chronic disease management, paving the way for further exploration on enhancing the communicative impact of artificial intelligence technology. TRIAL REGISTRATION: Biomedical Ethics Committee of Peking University: IRB00001052-21106, 2021/10/14; Clinical Trials: ChiCTR2100050578, 2021/08/29.

Subject(s)

Artificial Intelligence , Hypertension , Humans , Hypertension/therapy , Female , China , Male , Adult , Middle Aged , Patient Acceptance of Health Care/psychology , Primary Health Care , Qualitative Research

19.

Utility of artificial intelligence-based large language models in ophthalmic care.

Biswas, Sayantan; Davies, Leon N; Sheppard, Amy L; Logan, Nicola S; Wolffsohn, James S.

Ophthalmic Physiol Opt ; 44(3): 641-671, 2024 May.

Article in English | MEDLINE | ID: mdl-38404172

ABSTRACT

PURPOSE: With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported. RECENT FINDINGS: Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs. SUMMARY: Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.

Subject(s)

Artificial Intelligence , Ophthalmology , Humans , Clinical Decision-Making , Eye , Judgment

20.

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.

He, Wenjie; Zhang, Wenyan; Jin, Ya; Zhou, Qiang; Zhang, Huadan; Xia, Qing.

J Med Internet Res ; 26: e54706, 2024 Apr 30.

Article in English | MEDLINE | ID: mdl-38687566

ABSTRACT

BACKGROUND: There is a dearth of feasibility assessments regarding using large language models (LLMs) for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant research focus on applying these models in the medical field has been on English-speaking populations. OBJECTIVE: This study aims to assess the effectiveness of LLM chatbots, specifically ChatGPT-4 (OpenAI) and ERNIE Bot (version 2.2.3; Baidu, Inc), one of the most advanced LLMs in China, in addressing inquiries from autistic individuals in a Chinese setting. METHODS: For this study, we gathered data from DXY-a widely acknowledged, web-based, medical consultation platform in China with a user base of over 100 million individuals. A total of 100 patient consultation samples were rigorously selected from January 2018 to August 2023, amounting to 239 questions extracted from publicly available autism-related documents on the platform. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team of 3 chief physicians assessed the responses across 4 dimensions: relevance, accuracy, usefulness, and empathy. The team completed 717 evaluations. The team initially identified the best response and then used a Likert scale with 5 response categories to gauge the responses, each representing a distinct level of quality. Finally, we compared the responses collected from different sources. RESULTS: Among the 717 evaluations conducted, 46.86% (95% CI 43.21%-50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI 31.38%-38.36%) of assessors favoring ChatGPT and 18.27% (95% CI 15.44%-21.10%) of assessors favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI 3.69-3.82), 3.69 (95% CI 3.63-3.74), and 3.41 (95% CI 3.35-3.46), respectively. Physicians (3.66, 95% CI 3.60-3.73) and ChatGPT (3.73, 95% CI 3.69-3.77) demonstrated higher accuracy ratings compared to ERNIE Bot (3.52, 95% CI 3.47-3.57). In terms of usefulness scores, physicians (3.54, 95% CI 3.47-3.62) received higher ratings than ChatGPT (3.40, 95% CI 3.34-3.47) and ERNIE Bot (3.05, 95% CI 2.99-3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI 3.57-3.71) outperformed physicians (3.13, 95% CI 3.04-3.21) and ERNIE Bot (3.11, 95% CI 3.04-3.18). CONCLUSIONS: In this cross-sectional study, physicians' responses exhibited superiority in the present Chinese-language context. Nonetheless, LLMs can provide valuable medical guidance to autistic patients and may even surpass physicians in demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized. TRIAL REGISTRATION: Chinese Clinical Trial Registry ChiCTR2300074655; https://www.chictr.org.cn/bin/project/edit?pid=199432.

Subject(s)

Autistic Disorder , Female , Humans , Male , Autistic Disorder/psychology , China , Cross-Sectional Studies , East Asian People , Internet , Language , Physician-Patient Relations , Physicians/statistics & numerical data , Physicians/psychology , Artificial Intelligence

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL