Search | VHL Regional Portal

1.

Chatbot aimed to promote mental health services use for eating disorders following online screening: Comment.

Daungsupawong, Hinpetch; Wiwanitkit, Viroj.

Eur Eat Disord Rev ; 2024 Jul 04.

Article in English | MEDLINE | ID: mdl-38965787

2.

Possible benefits, challenges, pitfalls, and future perspective of using ChatGPT in pathology.

Aden, Durre; Zaheer, Sufian; Khan, Sabina.

Rev Esp Patol ; 57(3): 198-210, 2024.

Article in English | MEDLINE | ID: mdl-38971620

ABSTRACT

The much-hyped artificial intelligence (AI) model called ChatGPT developed by Open AI can have great benefits for physicians, especially pathologists, by saving time so that they can use their time for more significant work. Generative AI is a special class of AI model, which uses patterns and structures learned from existing data and can create new data. Utilizing ChatGPT in Pathology offers a multitude of benefits, encompassing the summarization of patient records and its promising prospects in Digital Pathology, as well as its valuable contributions to education and research in this field. However, certain roadblocks need to be dealt like integrating ChatGPT with image analysis which will act as a revolution in the field of pathology by increasing diagnostic accuracy and precision. The challenges with the use of ChatGPT encompass biases from its training data, the need for ample input data, potential risks related to bias and transparency, and the potential adverse outcomes arising from inaccurate content generation. Generation of meaningful insights from the textual information which will be efficient in processing different types of image data, such as medical images, and pathology slides. Due consideration should be given to ethical and legal issues including bias.

Subject(s)

Artificial Intelligence , Humans , Pathology , Pathology, Clinical , Image Processing, Computer-Assisted/methods , Forecasting

3.

Can chatbots enhance the management of pediatric sialadenitis in clinical practice?

Maniaci, Antonino; Lazzeroni, Matteo; Cozzi, Anna; Fraccaroli, Francesca; Gaffuri, Michele; Chiesa-Estomba, Carlos; Capaccio, Pasquale.

Eur Arch Otorhinolaryngol ; 2024 Jul 02.

Article in English | MEDLINE | ID: mdl-38955859

ABSTRACT

OBJECTIVE: The purpose of this study was to assess how well ChatGPT, an AI-powered chatbot, performed in helping to manage pediatric sialadenitis and identify when sialendoscopy was necessary. METHODS: 49 clinical cases of pediatric sialadenitis were retrospectively reviewed. ChatGPT was given patient data, and it offered differential diagnoses, proposed further tests, and suggested treatments. The decisions made by the treating otolaryngologists were contrasted with the answers provided by ChatGPT. Analysis was done on ChatGPT response consistency and interrater reliability. RESULTS: ChatGPT showed 78.57% accuracy in primary diagnosis, and 17.35% of cases were considered likely. On the other hand, otolaryngologists recommended fewer further examinations than ChatGPT (111 vs. 60, p < 0.001). For additional exams, poor agreement was found between ChatGPT and otolaryngologists. Only 28.57% of cases received a pertinent and essential treatment plan via ChatGPT, indicating that the platform's treatment recommendations were frequently lacking. For treatment ratings, judges' interrater reliability was greatest (Kendall's tau = 0.824, p < 0.001). For the most part, ChatGPT's response constancy was high. CONCLUSIONS: Although ChatGPT has the potential to correctly diagnose pediatric sialadenitis, there are a number of noteworthy limitations with regard to its ability to suggest further testing and treatment regimens. Before widespread clinical use, more research and confirmation are required. To guarantee that chatbots are utilized properly and effectively to supplement human expertise rather than to replace it, a critical viewpoint is required.

4.

Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.

Jo, Eunbeen; Song, Sanghoun; Kim, Jong-Ho; Lim, Subin; Kim, Ju Hyeon; Cha, Jung-Joon; Kim, Young-Min; Joo, Hyung Joon.

JMIR Med Educ ; 10: e51282, 2024 Jul 08.

Article in English | MEDLINE | ID: mdl-38989848

ABSTRACT

Background: Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI's GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse. Objective: This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses. Methods: We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio. Results: GPT-4 and human experts displayed comparable efficacy in medical accuracy ("GPT-4 is better" at 132/251, 52.6% vs "Human expert is better" at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P=.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P<.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P<.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience. Conclusions: GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions.

Subject(s)

Cardiology , Humans , Cardiology/standards

5.

Examining the association between emotional intelligence and chatbot utilization in education: A cross-sectional examination of undergraduate students in the UAE.

Mosleh, Sultan M; Alsaadi, Fton Ali; Alnaqbi, Fatima Khamis; Alkhzaimi, Meirah Abdullrahman; Alnaqbi, Shamma Waleed; Alsereidi, Waed Mohammed.

Heliyon ; 10(11): e31952, 2024 Jun 15.

Article in English | MEDLINE | ID: mdl-38868023

ABSTRACT

Background: While Emotional Intelligence (EI) demonstrably affects academic success, literature lacks exploration of how implementing chatbot in education might influence both academic performance and students' emotional intelligence, despite the evident potential of such technology. Aim: To investigate the associations between Emotional Intelligence (EI), chatbot utilization among undergraduate students. Methods: A cross-sectional approach was employed, utilizing a convenience sample of 529 undergraduate students recruited through online questionnaires. The participants completed the Trait Emotional Intelligence Questionnaire and modified and a modified versions of the unified theory of acceptance and use of technology (UTAUT) model. Results: of the 529 participants, 83.6 % (n = 440) of participants regularly used chatbot for learning. Students demonstrated a moderate average EI score (129.60 ± 50.15) and an exceptionally high score (89.61 ± 20.70) for chatbot acceptance and usage. A statistically significant (p < 0.001) positive correlation was found between chatbot usage frequency and EI total score. Gender and major emerged as significant factors, with female students (p < 0.05) and health science students (p < 0.05) utilizing chatbot less compared to male and other major students, respectively. A negative correlation (r = -0.111, p = 0.011) was observed between study hours and chatbot usage, suggesting students with higher study hours relied less on chatbot. Conclusions: The positive correlation between chatbot use and EI in this study sparks promising avenues for enhancing the learning experience. By investing in further research to understand this link and integrate AI tools thoughtfully, policymakers and educators can cultivate a learning environment that prioritizes both academic excellence and student well-being, reflecting the values and perspectives of UAE culture.

6.

Beyond the Hype-The Actual Role and Risks of AI in Today's Medical Practice: Comparative-Approach Study.

Hansen, Steffan; Brandt, Carl Joakim; Søndergaard, Jens.

JMIR AI ; 3: e49082, 2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38875597

ABSTRACT

BACKGROUND: The evolution of artificial intelligence (AI) has significantly impacted various sectors, with health care witnessing some of its most groundbreaking contributions. Contemporary models, such as ChatGPT-4 and Microsoft Bing, have showcased capabilities beyond just generating text, aiding in complex tasks like literature searches and refining web-based queries. OBJECTIVE: This study explores a compelling query: can AI author an academic paper independently? Our assessment focuses on four core dimensions: relevance (to ensure that AI's response directly addresses the prompt), accuracy (to ascertain that AI's information is both factually correct and current), clarity (to examine AI's ability to present coherent and logical ideas), and tone and style (to evaluate whether AI can align with the formality expected in academic writings). Additionally, we will consider the ethical implications and practicality of integrating AI into academic writing. METHODS: To assess the capabilities of ChatGPT-4 and Microsoft Bing in the context of academic paper assistance in general practice, we used a systematic approach. ChatGPT-4, an advanced AI language model by Open AI, excels in generating human-like text and adapting responses based on user interactions, though it has a knowledge cut-off in September 2021. Microsoft Bing's AI chatbot facilitates user navigation on the Bing search engine, offering tailored search. RESULTS: In terms of relevance, ChatGPT-4 delved deeply into AI's health care role, citing academic sources and discussing diverse applications and concerns, while Microsoft Bing provided a concise, less detailed overview. In terms of accuracy, ChatGPT-4 correctly cited 72% (23/32) of its peer-reviewed articles but included some nonexistent references. Microsoft Bing's accuracy stood at 46% (6/13), supplemented by relevant non-peer-reviewed articles. In terms of clarity, both models conveyed clear, coherent text. ChatGPT-4 was particularly adept at detailing technical concepts, while Microsoft Bing was more general. In terms of tone, both models maintained an academic tone, but ChatGPT-4 exhibited superior depth and breadth in content delivery. CONCLUSIONS: Comparing ChatGPT-4 and Microsoft Bing for academic assistance revealed strengths and limitations. ChatGPT-4 excels in depth and relevance but falters in citation accuracy. Microsoft Bing is concise but lacks robust detail. Though both models have potential, neither can independently handle comprehensive academic tasks. As AI evolves, combining ChatGPT-4's depth with Microsoft Bing's up-to-date referencing could optimize academic support. Researchers should critically assess AI outputs to maintain academic credibility.

7.

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.

Lahat, Adi; Sharif, Kassem; Zoabi, Narmin; Shneor Patt, Yonatan; Sharif, Yousra; Fisher, Lior; Shani, Uria; Arow, Mohamad; Levin, Roni; Klang, Eyal.

J Med Internet Res ; 26: e54571, 2024 Jun 27.

Article in English | MEDLINE | ID: mdl-38935937

ABSTRACT

BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. CONCLUSIONS: ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.

Subject(s)

Clinical Decision-Making , Humans , Artificial Intelligence

8.

ChatGPT-4 Performs Clinical Information Retrieval Tasks Utilizing Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.

Oeding, Jacob F; Lu, Amy Z; Mazzucco, Michael; Fu, Michael C; Taylor, Samuel A; Dines, David M; Warren, Russell F; Gulotta, Lawrence V; Dines, Joshua S; Kunze, Kyle N.

Arthroscopy ; 2024 Jun 25.

Article in English | MEDLINE | ID: mdl-38936557

ABSTRACT

PURPOSE: To assess the ability for ChatGPT-4, an automated Chatbot powered by artificial intelligence (AI), to answer common patient questions concerning the Latarjet procedure for patients with anterior shoulder instability and compare this performance to Google Search Engine. METHODS: Using previously validated methods, a Google search was first performed using the query "Latarjet." Subsequently, the top ten frequently asked questions (FAQs) and associated sources were extracted. ChatGPT-4 was then prompted to provide the top ten FAQs and answers concerning the procedure. This process was repeated to identify additional FAQs requiring discrete-numeric answers to allow for a comparison between ChatGPT-4 and Google. Discrete, numeric answers were subsequently assessed for accuracy based on the clinical judgement of two fellowship-trained sports medicine surgeons blinded to search platform. RESULTS: Mean (±standard deviation) accuracy to numeric-based answers were 2.9±0.9 for ChatGPT-4 versus 2.5±1.4 for Google (p=0.65). ChatGPT-4 derived information for answers only from academic sources, which was significantly different from Google Search Engine (p=0.003), which used only 30% academic sources and websites from individual surgeons (50%) and larger medical practices (20%). For general FAQs, 40% of FAQs were found to be identical when comparing ChatGPT-4 and Google Search Engine. In terms of sources used to answer these questions, ChatGPT-4 again used 100% academic resources, while Google Search Engine used 60% academic resources, 20% surgeon personal websites, and 20% medical practices (p=0.087). CONCLUSION: ChatGPT-4 demonstrated the ability to provide accurate and reliable information about the Latarjet procedure in response to patient queries, using multiple academic sources in all cases. This was in contrast to Google Search Engine, which more frequently used single surgeon and large medical practice websites. Despite differences in the resources accessed to perform information retrieval tasks, the clinical relevance and accuracy of information provided did not significantly differ between ChatGPT-4 and Google Search Engine.

9.

Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.

Kim, Hong Jin; Yang, Jae Hyuk; Chang, Dong-Gune; Lenke, Lawrence G; Pizones, Javier; Castelein, René; Watanabe, Kota; Trobisch, Per D; Mundis, Gregory M; Suh, Seung Woo; Suk, Se-Il.

J Med Internet Res ; 26: e52001, 2024 Jun 26.

Article in English | MEDLINE | ID: mdl-38924787

ABSTRACT

BACKGROUND: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as "Gemini"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. OBJECTIVE: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. METHODS: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. RESULTS: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. CONCLUSIONS: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.

Subject(s)

Abstracting and Indexing , Spine , Humans , Spine/surgery , Abstracting and Indexing/standards , Abstracting and Indexing/methods , Reproducibility of Results , Artificial Intelligence , Writing/standards

10.

The efficacy of online physical activity interventions with added mobile elements within adults aged 50 years and over: Randomized controlled trial.

Collombon, Eline H G M; Bolman, Catherine A W; de Bruijn, Gert-Jan; Peels, Denise A; Verboon, Peter; Lechner, Lilian.

Appl Psychol Health Well Being ; 2024 Jun 26.

Article in English | MEDLINE | ID: mdl-38925643

ABSTRACT

Electronic health (eHealth) and mobile health (mHealth) could stimulate physical activity (PA) in a time-efficient and cost-effective way. This randomized controlled trial aims to investigate effects on moderate-to-vigorous PA (MVPA) of different combined computer- and mobile-based PA interventions targeted at adults aged 50 years and over. Participants (N = 954) were randomly allocated to a basic existing computer-based intervention (Active Plus [AP] or I Move [IM]) supplemented with one of three mobile elements being (1) activity tracker (AT), (2) ecological momentary intervention (EMI), or (3) chatbot (CB) or a control group (CG). MVPA was assessed via the SQUASH at baseline (T0), 3 months (T1), and 6 months (T2) and via accelerometers at T0 and T2. No intervention effects were found on objective (p = .502) and subjective (p = .368) MVPA for main research groups (AP/IM + AT, AP/IM + EMI, AP/IM + CB). Preliminary MVPA findings for subgroups (AP + AT, AP + EMI, AP + CB, IM + AT, IM + EMI, IM + CB) combined with drop-out data showed potential for the computer-based intervention AP with an integrated AT. Based on these preliminary findings, eHealth developers can be recommended to integrate ATs with existing computer-based PA interventions. However, further research is recommended to confirm the findings as a result of the exploratory nature of the subgroup analyses.

11.

Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD Patients: Assessment of Accuracy, Completeness and Comprehensibility.

Pugliese, Nicola; Polverini, Davide; Lombardi, Rosa; Pennisi, Grazia; Ravaioli, Federico; Armandi, Angelo; Buzzetti, Elena; Dalbeni, Andrea; Liguori, Antonio; Mantovani, Alessandro; Villani, Rosanna; Gardini, Ivan; Hassan, Cesare; Valenti, Luca; Miele, Luca; Petta, Salvatore; Sebastiani, Giada; Aghemo, Alessio.

J Pers Med ; 14(6)2024 May 26.

Article in English | MEDLINE | ID: mdl-38929789

ABSTRACT

BACKGROUND: Artificial intelligence (AI)-based chatbots have shown promise in providing counseling to patients with metabolic dysfunction-associated steatotic liver disease (MASLD). While ChatGPT3.5 has demonstrated the ability to comprehensively answer MASLD-related questions in English, its accuracy remains suboptimal. Whether language influences these results is unclear. This study aims to assess ChatGPT's performance as a counseling tool for Italian MASLD patients. METHODS: Thirteen Italian experts rated the accuracy, completeness and comprehensibility of ChatGPT3.5 in answering 15 MASLD-related questions in Italian using a six-point accuracy, three-point completeness and three-point comprehensibility Likert's scale. RESULTS: Mean scores for accuracy, completeness and comprehensibility were 4.57 ± 0.42, 2.14 ± 0.31 and 2.91 ± 0.07, respectively. The physical activity domain achieved the highest mean scores for accuracy and completeness, whereas the specialist referral domain achieved the lowest. Overall, Fleiss's coefficient of concordance for accuracy, completeness and comprehensibility across all 15 questions was 0.016, 0.075 and -0.010, respectively. Age and academic role of the evaluators did not influence the scores. The results were not significantly different from our previous study focusing on English. CONCLUSION: Language does not appear to affect ChatGPT's ability to provide comprehensible and complete counseling to MASLD patients, but accuracy remains suboptimal in certain domains.

12.

Comparative analysis of artificial intelligence chatbot recommendations for urolithiasis management: A study of EAU guideline compliance.

Altintas, Emre; Ozkent, Mehmet Serkan; Gül, Murat; Batur, Ali Furkan; Kaynar, Mehmet; Kiliç, Özcan; Göktas, Serdar.

Fr J Urol ; 34(7-8): 102666, 2024 Jun 05.

Article in English | MEDLINE | ID: mdl-38849035

ABSTRACT

OBJECTIVES: Artificial intelligence (AI) applications are increasingly being utilized by both patients and physicians for accessing medical information. This study focused on the urolithiasis section (pertaining to kidney and ureteral stones) of the European Association of Urology (EAU) guideline, a key reference for urologists. MATERIAL AND METHODS: We directed inquiries to four distinct AI chatbots to assess their responses in relation to guideline adherence. A total of 115 recommendations were transformed into questions, and responses were evaluated by two urologists with a minimum of 5 years of experience using a 5-point Likert scale (1 - False, 2 - Inadequate, 3 - Sufficient, 4 - Correct, and 5 - Very correct). RESULTS: The mean scores for Perplexity and ChatGPT 4.0 were 4.68 (SD: 0.80) and 4.80 (SD: 0.47), respectively, both significantly differed the scores of Bing and Bard (Bing vs. Perplexity, P<0.001; Bard vs. Perplexity, P<0.001; Bing vs. ChatGPT, P<0.001; Bard vs. ChatGPT, P<0.001). Bing had a mean score of 4.21 (SD: 0.96), while Bard scored 3.56 (SD: 1.14), with a significant difference (Bing vs. Bard, P<0.001). Bard exhibited the lowest score among all chatbots. Analysis of references revealed that Perplexity and Bing cited the guideline most frequently (47.3% and 30%, respectively). CONCLUSION: Our findings demonstrate that ChatGPT 4.0 and, notably, Perplexity align well with EAU guideline recommendations. These continuously evolving applications may play a crucial role in delivering information to physicians in the future, especially for urolithiasis.

13.

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Jeong, Hui; Han, Sang-Sun; Yu, Youngjae; Kim, Saejin; Jin Jeon, Kug.

Dentomaxillofac Radiol ; 2024 Jun 07.

Article in English | MEDLINE | ID: mdl-38848473

ABSTRACT

OBJECTIVES: This study evaluated the performance of four large language model (LLM)-based chatbots by comparing their test results with those of dental students on an oral and maxillofacial radiology examination. METHODS: ChatGPT, ChatGPT Plus, Bard, and Bing Chat were tested on 52 questions from regular dental college examinations. These questions were categorized into three educational content areas: basic knowledge, imaging and equipment, and image interpretation. They were also classified as multiple-choice questions (MCQs) and short-answer questions (SAQs). The accuracy rates of the chatbots were compared with the performance of students, and further analysis was conducted based on the educational content and question type. RESULTS: The students' overall accuracy rate was 81.2%, while that of the chatbots varied: 50.0% for ChatGPT, 65.4% for ChatGPT Plus, 50.0% for Bard, and 63.5% for Bing Chat. ChatGPT Plus achieved a higher accuracy rate for basic knowledge than the students (93.8% vs. 78.7%). However, all chatbots performed poorly in image interpretation, with accuracy rates below 35.0%. All chatbots scored less than 60.0% on MCQs, but performed better on SAQs. CONCLUSIONS: The performance of chatbots in oral and maxillofacial radiology was unsatisfactory. Further training using specific, relevant data derived solely from reliable sources is required. Additionally, the validity of these chatbots' responses must be meticulously verified. ADVANCES IN KNOWLEDGE: This study is the first in the field of oral and maxillofacial radiology to assess the knowledge levels of four chatbots. We recommend further training in this domain for all chatbots, given their unsatisfactory performance.

14.

Virtual health assistants: a grand challenge in health communications and behavior change.

Maher, Carol; Singh, Ben; Wylde, Allison; Chastin, Sebastien.

Front Digit Health ; 6: 1418695, 2024.

Article in English | MEDLINE | ID: mdl-38827384

15.

Evaluation of information accuracy and clarity: ChatGPT responses to the most frequently asked questions about premature ejaculation.

Sahin, Mehmet Fatih; Keles, Anil; Özcan, Ridvan; Dogan, Çagri; Topkaç, Erdem Can; Akgül, Murat; Yazici, Cenk Murat.

Sex Med ; 12(3): qfae036, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38832125

ABSTRACT

Background: Premature ejaculation (PE) is the most prevalent sexual dysfunction in men, and like many diseases and conditions, patients use Internet sources like ChatGPT, which is a popular artificial intelligence-based language model, for queries about this andrological disorder. Aim: The objective of this research was to evaluate the quality, readability, and understanding of texts produced by ChatGPT in response to frequently requested inquiries on PE. Methods: In this study we used Google Trends to identify the most frequently searched phrases related to PE. Subsequently, the discovered keywords were methodically entered into ChatGPT, and the resulting replies were assessed for quality using the Ensuring Quality Information for Patients (EQIP) program. The produced texts were assessed for readability using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), and DISCERN metrics. Outcomes: This investigation has identified substantial concerns about the quality of texts produced by ChatGPT, highlighting severe problems with reading and understanding. Results: The mean EQIP score for the texts was determined to be 45.93 ± 4.34, while the FRES was 15.8 ± 8.73. Additionally, the FKGL score was computed to be 15.68 ± 1.67 and the DISCERN score was 38.1 ± 3.78. The comparatively low average EQIP and DISCERN scores suggest that improvements are required to increase the quality and dependability of the presented information. In addition, the FKGL scores indicate a significant degree of linguistic intricacy, requiring a level of knowledge comparable to about 14 to 15 years of formal schooling in order to understand. The texts about treatment, which are the most frequently searched items, are more difficult to understand compared to other texts about other categories. Clinical Implications: The results of this research suggest that compared to texts on other topics the PE texts produced by ChatGPT exhibit a higher degree of complexity, which exceeds the recommended reading threshold for effective health communication. Currently, ChatGPT is cannot be considered a substitute for comprehensive medical consultations. Strengths and Limitations: This study is to our knowledge the first reported research investigating the quality and comprehensibility of information generated by ChatGPT in relation to frequently requested queries about PE. The main limitation is that the investigation included only the first 25 popular keywords in English. Conclusion: ChatGPT is incapable of replacing the need for thorough medical consultations.

16.

Young peoples' reflections about using a chatbot to promote their mental wellbeing in northern periphery areas - a qualitative study.

Kostenius, Catrine; Lindstrom, Frida; Potts, Courtney; Pekkari, Niklas.

Int J Circumpolar Health ; 83(1): 2369349, 2024 Dec.

Article in English | MEDLINE | ID: mdl-38912845

ABSTRACT

An international research collaboration with researchers from northern Sweden, Finland, Ireland, Northern Ireland, Scotland and developed the ChatPal chatbot to explore the possibility of a multilingual chatbot to promote mental wellbeing in people of all ages. In Sweden the end users were young people. The aim of the current study was to explore and discuss Swedish young peoples' experiences of using a chatbot designed to promote their mental wellbeing. Young people aged 15-19 filled out an open-ended survey giving feedback on the ChatPal chatbot and their suggestions on improvements. A total of 122 survey responses were analysed. The qualitative content analysis of the survey responses resulted in three themes each containing two to three sub-themes. Theme 1, feeling as if someone is there when needed, which highlighted positive aspects regarding availability and accessibility. Theme 2, human-robot interaction has its limitations, which included aspects such as unnatural and impersonal conversations and limited content availability. Theme 3, usability can be improved, given technical errors due to lack of internet connection and difficulty navigating the chatbot were brought up as issues. The findings are discussed, and potential implications are offered for those designing and developing digital mental health technologies for young people.

Subject(s)

Mental Health , Qualitative Research , Humans , Adolescent , Male , Female , Young Adult , Arctic Regions , Sweden , Health Promotion/organization & administration , Internet

17.

User experience of a family health history chatbot: A quantitative analysis.

Soni, Hiral; Morrison, Heath; Vasilev, Dinko; Ong, Triton; Wilczewski, Hattie; Allen, Caitlin; Hughes-Halbert, Chanita; Ritchie, Jordon B; Narma, Alexa; Schiffman, Joshua D; Ivanova, Julia; Bunnell, Brian E; Welch, Brandon M.

Health Informatics J ; 30(2): 14604582241262251, 2024.

Article in English | MEDLINE | ID: mdl-38865081

ABSTRACT

OBJECTIVE: Family health history (FHx) is an important tool in assessing one's risk towards specific health conditions. However, user experience of FHx collection tools is rarely studied. ItRunsInMyFamily.com (ItRuns) was developed to assess FHx and hereditary cancer risk. This study reports a quantitative user experience analysis of ItRuns. METHODS: We conducted a public health campaign in November 2019 to promote FHx collection using ItRuns. We used software telemetry to quantify abandonment and time spent on ItRuns to identify user behaviors and potential areas of improvement. RESULTS: Of 11,065 users who started the ItRuns assessment, 4305 (38.91%) reached the final step to receive recommendations about hereditary cancer risk. Highest abandonment rates were during Introduction (32.82%), Invite Friends (29.03%), and Family Cancer History (12.03%) subflows. Median time to complete the assessment was 636 s. Users spent the highest median time on Proband Cancer History (124.00 s) and Family Cancer History (119.00 s) subflows. Search list questions took the longest to complete (median 19.50 s), followed by free text email input (15.00 s). CONCLUSION: Knowledge of objective user behaviors at a large scale and factors impacting optimal user experience will help enhance the ItRuns workflow and improve future FHx collection.

Subject(s)

Medical History Taking , Humans , Medical History Taking/methods , Medical History Taking/statistics & numerical data , Family Health , Female , Male , Telemetry/methods , Software

18.

Navigating the Pedagogical Landscape: Exploring the Implications of AI and Chatbots in Nursing Education.

Srinivasan, Muthuvenkatachalam; Venugopal, Ambili; Venkatesan, Latha; Kumar, Rajesh.

JMIR Nurs ; 7: e52105, 2024 Jun 13.

Article in English | MEDLINE | ID: mdl-38870516

ABSTRACT

This viewpoint paper explores the pedagogical implications of artificial intelligence (AI) and AI-based chatbots such as ChatGPT in nursing education, examining their potential uses, benefits, challenges, and ethical considerations. AI and chatbots offer transformative opportunities for nursing education, such as personalized learning, simulation and practice, accessible learning, and improved efficiency. They have the potential to increase student engagement and motivation, enhance learning outcomes, and augment teacher support. However, the integration of these technologies also raises ethical considerations, such as privacy, confidentiality, and bias. The viewpoint paper provides a comprehensive overview of the current state of AI and chatbots in nursing education, offering insights into best practices and guidelines for their integration. By examining the impact of AI and ChatGPT on student learning, engagement, and teacher effectiveness and efficiency, this review aims to contribute to the ongoing discussion on the use of AI and chatbots in nursing education and provide recommendations for future research and development in the field.

Subject(s)

Artificial Intelligence , Education, Nursing , Humans , Students, Nursing/psychology

19.

Parents' Perceptions of Their Parenting Journeys and a Mobile App Intervention (Parentbot-A Digital Healthcare Assistant): Qualitative Process Evaluation.

Chua, Joelle Yan Xin; Choolani, Mahesh; Chee, Cornelia Yin Ing; Yi, Huso; Chan, Yiong Huak; Lalor, Joan Gabrielle; Chong, Yap Seng; Shorey, Shefaly.

J Med Internet Res ; 26: e56894, 2024 Jun 21.

Article in English | MEDLINE | ID: mdl-38905628

ABSTRACT

BACKGROUND: Parents experience many challenges during the perinatal period. Mobile app-based interventions and chatbots show promise in delivering health care support for parents during the perinatal period. OBJECTIVE: This descriptive qualitative process evaluation study aims to explore the perinatal experiences of parents in Singapore, as well as examine the user experiences of the mobile app-based intervention with an in-built chatbot titled Parentbot-a Digital Healthcare Assistant (PDA). METHODS: A total of 20 heterosexual English-speaking parents were recruited via purposive sampling from a single tertiary hospital in Singapore. The parents (control group: 10/20, 50%; intervention group: 10/20, 50%) were also part of an ongoing randomized trial between November 2022 and August 2023 that aimed to evaluate the effectiveness of the PDA in improving parenting outcomes. Semistructured one-to-one interviews were conducted via Zoom from February to June 2023. All interviews were conducted in English, audio recorded, and transcribed verbatim. Data analysis was guided by the thematic analysis framework. The COREQ (Consolidated Criteria for Reporting Qualitative Research) checklist was used to guide the reporting of data. RESULTS: Three themes with 10 subthemes describing parents' perceptions of their parenting journeys and their experiences with the PDA were identified. The main themes were (1) new babies, new troubles, and new wonders; (2) support system for the parents; and (3) reshaping perinatal support for future parents. CONCLUSIONS: Overall, the PDA provided parents with informational, socioemotional, and psychological support and could be used to supplement the perinatal care provided for future parents. To optimize users' experience with the PDA, the intervention could be equipped with a more sophisticated chatbot, equipped with more gamification features, and programmed to deliver personalized care to parents. Researchers and health care providers could also strive to promote more peer-to-peer interactions among users. The provision of continuous, holistic, and family-centered care by health care professionals could also be emphasized. Moreover, policy changes regarding maternity and paternity leaves, availability of infant care centers, and flexible work arrangements could be further explored to promote healthy work-family balance for parents.

Subject(s)

Mobile Applications , Parenting , Parents , Qualitative Research , Humans , Parents/psychology , Parenting/psychology , Female , Singapore , Male , Adult , Pregnancy

20.

A Case for Caution: Patient Use of Artificial Intelligence.

Stewart, Lisa; Patterson, Wesley G; Farrell, Christopher; Withycombe, Janice S.

Clin J Oncol Nurs ; 28(3): 252-256, 2024 May 17.

Article in English | MEDLINE | ID: mdl-38830249

ABSTRACT

Artificial intelligence use is increasing exponentially, including by patients in medical decision- making. Because of the limitations of chatbots and the possibility of receiving erroneous or incomplete information, patient.

Subject(s)

Artificial Intelligence , Clinical Decision-Making , Humans

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL