Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 284
Filtrar
1.
Am J Obstet Gynecol ; 231(2): 276.e1-276.e10, 2024 08.
Artigo em Inglês | MEDLINE | ID: mdl-38710267

RESUMO

BACKGROUND: ChatGPT, a publicly available artificial intelligence large language model, has allowed for sophisticated artificial intelligence technology on demand. Indeed, use of ChatGPT has already begun to make its way into medical research. However, the medical community has yet to understand the capabilities and ethical considerations of artificial intelligence within this context, and unknowns exist regarding ChatGPT's writing abilities, accuracy, and implications for authorship. OBJECTIVE: We hypothesize that human reviewers and artificial intelligence detection software differ in their ability to correctly identify original published abstracts and artificial intelligence-written abstracts in the subjects of Gynecology and Urogynecology. We also suspect that concrete differences in writing errors, readability, and perceived writing quality exist between original and artificial intelligence-generated text. STUDY DESIGN: Twenty-five articles published in high-impact medical journals and a collection of Gynecology and Urogynecology journals were selected. ChatGPT was prompted to write 25 corresponding artificial intelligence-generated abstracts, providing the abstract title, journal-dictated abstract requirements, and select original results. The original and artificial intelligence-generated abstracts were reviewed by blinded Gynecology and Urogynecology faculty and fellows to identify the writing as original or artificial intelligence-generated. All abstracts were analyzed by publicly available artificial intelligence detection software GPTZero, Originality, and Copyleaks, and were assessed for writing errors and quality by artificial intelligence writing assistant Grammarly. RESULTS: A total of 157 reviews of 25 original and 25 artificial intelligence-generated abstracts were conducted by 26 faculty and 4 fellows; 57% of original abstracts and 42.3% of artificial intelligence-generated abstracts were correctly identified, yielding an average accuracy of 49.7% across all abstracts. All 3 artificial intelligence detectors rated the original abstracts as less likely to be artificial intelligence-written than the ChatGPT-generated abstracts (GPTZero, 5.8% vs 73.3%; P<.001; Originality, 10.9% vs 98.1%; P<.001; Copyleaks, 18.6% vs 58.2%; P<.001). The performance of the 3 artificial intelligence detection software differed when analyzing all abstracts (P=.03), original abstracts (P<.001), and artificial intelligence-generated abstracts (P<.001). Grammarly text analysis identified more writing issues and correctness errors in original than in artificial intelligence abstracts, including lower Grammarly score reflective of poorer writing quality (82.3 vs 88.1; P=.006), more total writing issues (19.2 vs 12.8; P<.001), critical issues (5.4 vs 1.3; P<.001), confusing words (0.8 vs 0.1; P=.006), misspelled words (1.7 vs 0.6; P=.02), incorrect determiner use (1.2 vs 0.2; P=.002), and comma misuse (0.3 vs 0.0; P=.005). CONCLUSION: Human reviewers are unable to detect the subtle differences between human and ChatGPT-generated scientific writing because of artificial intelligence's ability to generate tremendously realistic text. Artificial intelligence detection software improves the identification of artificial intelligence-generated writing, but still lacks complete accuracy and requires programmatic improvements to achieve optimal detection. Given that reviewers and editors may be unable to reliably detect artificial intelligence-generated texts, clear guidelines for reporting artificial intelligence use by authors and implementing artificial intelligence detection software in the review process will need to be established as artificial intelligence chatbots gain more widespread use.


Assuntos
Inteligência Artificial , Ginecologia , Urologia , Humanos , Indexação e Redação de Resumos , Publicações Periódicas como Assunto , Software , Redação , Autoria
2.
BMC Infect Dis ; 24(1): 799, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39118057

RESUMO

BACKGROUND: Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. METHODS: The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. RESULTS: In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002). CONCLUSIONS: Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.


Assuntos
Inteligência Artificial , Doenças Transmissíveis , Idioma , Humanos , COVID-19
3.
Future Oncol ; : 1-6, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38646965

RESUMO

Background: Medical practitioners are increasingly using artificial intelligence (AI) chatbots for easier and faster access to information. To our knowledge, the accuracy and availability of AI-generated chemotherapy protocols has not yet been studied. Methods: Nine simulated cancer patient cases were designed and AI chatbots, ChatGPT version 3.5 (OpenAI) and Bing (Microsoft), were used to generate chemotherapy protocols for each case. Results: Generated chemotherapy protocols were compared with the original protocols for nine simulated cancer patients. ChatGPT's overall performance was 5 out of 9 on protocol generation, and Bing's was 4 out of 9; this was statistically nonsignificant (p = 1). Conclusion: AI chatbots show both potential and limitations in generating chemotherapy protocols. The overall performance is low, and they should be used carefully in oncological practice.


[Box: see text].

4.
Ann Hepatol ; : 101537, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39147133

RESUMO

INTRODUCTION AND OBJECTIVES: Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs. MATERIALS AND METHODS: We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance. RESULTS: Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models. CONCLUSIONS: Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.

5.
Bioethics ; 38(6): 503-510, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38735049

RESUMO

Mental health chatbots (MHCBs) designed to support individuals in coping with mental health issues are rapidly advancing. Currently, these MHCBs are predominantly used in commercial rather than clinical contexts, but this might change soon. The question is whether this use is ethically desirable. This paper addresses a critical yet understudied concern: assuming that MHCBs cannot have genuine emotions, how this assumption may affect psychotherapy, and consequently the quality of treatment outcomes. We argue that if MHCBs lack emotions, they cannot have genuine (affective) empathy or utilise countertransference. Consequently, this gives reason to worry that MHCBs are (a) more liable to harm and (b) less likely to benefit patients than human therapists. We discuss some responses to this worry and conclude that further empirical research is necessary to determine whether these worries are valid. We conclude that, even if these worries are valid, it does not mean that we should never use MHCBs. By discussing the broader ethical debate on the clinical use of chatbots, we point towards how further research can help us establish ethical boundaries for how we should use mental health chatbots.


Assuntos
Emoções , Empatia , Psicoterapeutas , Psicoterapia , Humanos , Psicoterapia/ética , Contratransferência , Transtornos Mentais/terapia , Saúde Mental , Adaptação Psicológica
6.
J Med Internet Res ; 26: e54840, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38512309

RESUMO

While digital innovation in health was already rapidly evolving, the COVID-19 pandemic has accelerated the generation of digital technology tools, such as chatbots, to help increase access to crucial health information and services to those who were cut off or had limited contact with health services. This theme issue titled "Chatbots and COVID-19" presents articles from researchers and practitioners across the globe, describing the development, implementation, and evaluation of chatbots designed to address a wide range of health concerns and services. In this editorial, we present some of the key challenges and lessons learned arising from the content of this theme issue. Most notably, we note that a stronger evidence base is needed to ensure that chatbots and other digital tools are developed to best serve the needs of population health.


Assuntos
COVID-19 , Saúde da População , Humanos , Pandemias/prevenção & controle , Tecnologia Digital
7.
J Med Internet Res ; 26: e53225, 2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38241074

RESUMO

This editorial explores the evolving and transformative role of large language models (LLMs) in enhancing the capabilities of virtual assistants (VAs) in the health care domain, highlighting recent research on the performance of VAs and LLMs in health care information sharing. Focusing on recent research, this editorial unveils the marked improvement in the accuracy and clinical relevance of responses from LLMs, such as GPT-4, compared to current VAs, especially in addressing complex health care inquiries, like those related to postpartum depression. The improved accuracy and clinical relevance with LLMs mark a paradigm shift in digital health tools and VAs. Furthermore, such LLM applications have the potential to dynamically adapt and be integrated into existing VA platforms, offering cost-effective, scalable, and inclusive solutions. These suggest a significant increase in the applicable range of VA applications, as well as the increased value, risk, and impact in health care, moving toward more personalized digital health ecosystems. However, alongside these advancements, it is necessary to develop and adhere to ethical guidelines, regulatory frameworks, governance principles, and privacy and safety measures. We need a robust interdisciplinary collaboration to navigate the complexities of safely and effectively integrating LLMs into health care applications, ensuring that these emerging technologies align with the diverse needs and ethical considerations of the health care domain.


Assuntos
Depressão Pós-Parto , Ecossistema , Feminino , Humanos , Saúde Digital , Disseminação de Informação , Idioma
8.
J Med Internet Res ; 26: e54758, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38758582

RESUMO

BACKGROUND: Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend. OBJECTIVE: We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT. METHODS: We submitted 72 open-ended HSCT-related questions of variable difficulty to the LLMs and rated their responses based on consistency-defined as replicability of the response-response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience. RESULTS: ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader. CONCLUSIONS: In conclusion, despite LLMs' potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs' ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application.


Assuntos
Pessoal de Saúde , Transplante de Células-Tronco Hematopoéticas , Humanos , Inteligência Artificial , Idioma
9.
J Med Internet Res ; 26: e54571, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38935937

RESUMO

BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. CONCLUSIONS: ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.


Assuntos
Tomada de Decisão Clínica , Humanos , Inteligência Artificial
10.
J Med Internet Res ; 26: e51837, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38441945

RESUMO

BACKGROUND: Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT's underlying language model a serious concern. OBJECTIVE: Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human-use sessions (N=1400 total letters). METHODS: We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular "male" and "female" names in the United States. Study 1 used 3 different letter-writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between-prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender-stereotyped professional accomplishments, ChatGPT would evidence gender-based language differences replicating those found in systematic reviews of human-written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill-based language for male names). RESULTS: Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt-raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. CONCLUSIONS: ChatGPT reproduces many gender-based language biases that have been reliably identified in investigations of human-written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real-world scenarios. TRIAL REGISTRATION: OSF Registries osf.io/ztv96; https://osf.io/ztv96.


Assuntos
Inteligência Artificial , Sexismo , Humanos , Feminino , Masculino , Revisões Sistemáticas como Assunto , Idioma , Linguística
11.
J Med Internet Res ; 26: e46036, 2024 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-38713909

RESUMO

BACKGROUND: A plethora of weight management apps are available, but many individuals, especially those living with overweight and obesity, still struggle to achieve adequate weight loss. An emerging area in weight management is the support for one's self-regulation over momentary eating impulses. OBJECTIVE: This study aims to examine the feasibility and effectiveness of a novel artificial intelligence-assisted weight management app in improving eating behaviors in a Southeast Asian cohort. METHODS: A single-group pretest-posttest study was conducted. Participants completed the 1-week run-in period of a 12-week app-based weight management program called the Eating Trigger-Response Inhibition Program (eTRIP). This self-monitoring system was built upon 3 main components, namely, (1) chatbot-based check-ins on eating lapse triggers, (2) food-based computer vision image recognition (system built based on local food items), and (3) automated time-based nudges and meal stopwatch. At every mealtime, participants were prompted to take a picture of their food items, which were identified by a computer vision image recognition technology, thereby triggering a set of chatbot-initiated questions on eating triggers such as who the users were eating with. Paired 2-sided t tests were used to compare the differences in the psychobehavioral constructs before and after the 7-day program, including overeating habits, snacking habits, consideration of future consequences, self-regulation of eating behaviors, anxiety, depression, and physical activity. Qualitative feedback were analyzed by content analysis according to 4 steps, namely, decontextualization, recontextualization, categorization, and compilation. RESULTS: The mean age, self-reported BMI, and waist circumference of the participants were 31.25 (SD 9.98) years, 28.86 (SD 7.02) kg/m2, and 92.60 (SD 18.24) cm, respectively. There were significant improvements in all the 7 psychobehavioral constructs, except for anxiety. After adjusting for multiple comparisons, statistically significant improvements were found for overeating habits (mean -0.32, SD 1.16; P<.001), snacking habits (mean -0.22, SD 1.12; P<.002), self-regulation of eating behavior (mean 0.08, SD 0.49; P=.007), depression (mean -0.12, SD 0.74; P=.007), and physical activity (mean 1288.60, SD 3055.20 metabolic equivalent task-min/day; P<.001). Forty-one participants reported skipping at least 1 meal (ie, breakfast, lunch, or dinner), summing to 578 (67.1%) of the 862 meals skipped. Of the 230 participants, 80 (34.8%) provided textual feedback that indicated satisfactory user experience with eTRIP. Four themes emerged, namely, (1) becoming more mindful of self-monitoring, (2) personalized reminders with prompts and chatbot, (3) food logging with image recognition, and (4) engaging with a simple, easy, and appealing user interface. The attrition rate was 8.4% (21/251). CONCLUSIONS: eTRIP is a feasible and effective weight management program to be tested in a larger population for its effectiveness and sustainability as a personalized weight management program for people with overweight and obesity. TRIAL REGISTRATION: ClinicalTrials.gov NCT04833803; https://classic.clinicaltrials.gov/ct2/show/NCT04833803.


Assuntos
Inteligência Artificial , Comportamento Alimentar , Aplicativos Móveis , Humanos , Comportamento Alimentar/psicologia , Adulto , Feminino , Masculino , Obesidade/psicologia , Obesidade/terapia , Pessoa de Meia-Idade
12.
J Med Internet Res ; 26: e55939, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39141904

RESUMO

BACKGROUND: Artificial intelligence (AI) chatbots, such as ChatGPT, have made significant progress. These chatbots, particularly popular among health care professionals and patients, are transforming patient education and disease experience with personalized information. Accurate, timely patient education is crucial for informed decision-making, especially regarding prostate-specific antigen screening and treatment options. However, the accuracy and reliability of AI chatbots' medical information must be rigorously evaluated. Studies testing ChatGPT's knowledge of prostate cancer are emerging, but there is a need for ongoing evaluation to ensure the quality and safety of information provided to patients. OBJECTIVE: This study aims to evaluate the quality, accuracy, and readability of ChatGPT-4's responses to common prostate cancer questions posed by patients. METHODS: Overall, 8 questions were formulated with an inductive approach based on information topics in peer-reviewed literature and Google Trends data. Adapted versions of the Patient Education Materials Assessment Tool for AI (PEMAT-AI), Global Quality Score, and DISCERN-AI tools were used by 4 independent reviewers to assess the quality of the AI responses. The 8 AI outputs were judged by 7 expert urologists, using an assessment framework developed to assess accuracy, safety, appropriateness, actionability, and effectiveness. The AI responses' readability was assessed using established algorithms (Flesch Reading Ease score, Gunning Fog Index, Flesch-Kincaid Grade Level, The Coleman-Liau Index, and Simple Measure of Gobbledygook [SMOG] Index). A brief tool (Reference Assessment AI [REF-AI]) was developed to analyze the references provided by AI outputs, assessing for reference hallucination, relevance, and quality of references. RESULTS: The PEMAT-AI understandability score was very good (mean 79.44%, SD 10.44%), the DISCERN-AI rating was scored as "good" quality (mean 13.88, SD 0.93), and the Global Quality Score was high (mean 4.46/5, SD 0.50). Natural Language Assessment Tool for AI had pooled mean accuracy of 3.96 (SD 0.91), safety of 4.32 (SD 0.86), appropriateness of 4.45 (SD 0.81), actionability of 4.05 (SD 1.15), and effectiveness of 4.09 (SD 0.98). The readability algorithm consensus was "difficult to read" (Flesch Reading Ease score mean 45.97, SD 8.69; Gunning Fog Index mean 14.55, SD 4.79), averaging an 11th-grade reading level, equivalent to 15- to 17-year-olds (Flesch-Kincaid Grade Level mean 12.12, SD 4.34; The Coleman-Liau Index mean 12.75, SD 1.98; SMOG Index mean 11.06, SD 3.20). REF-AI identified 2 reference hallucinations, while the majority (28/30, 93%) of references appropriately supplemented the text. Most references (26/30, 86%) were from reputable government organizations, while a handful were direct citations from scientific literature. CONCLUSIONS: Our analysis found that ChatGPT-4 provides generally good responses to common prostate cancer queries, making it a potentially valuable tool for patient education in prostate cancer care. Objective quality assessment tools indicated that the natural language processing outputs were generally reliable and appropriate, but there is room for improvement.


Assuntos
Educação de Pacientes como Assunto , Neoplasias da Próstata , Humanos , Masculino , Educação de Pacientes como Assunto/métodos , Inteligência Artificial
13.
BMC Med Inform Decis Mak ; 24(1): 211, 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39075513

RESUMO

BACKGROUND: To evaluate the accuracy, reliability, quality, and readability of responses generated by ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot in relation to orthodontic clear aligners. METHODS: Frequently asked questions by patients/laypersons about clear aligners on websites were identified using the Google search tool and these questions were posed to ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot AI models. Responses were assessed using a five-point Likert scale for accuracy, the modified DISCERN scale for reliability, the Global Quality Scale (GQS) for quality, and the Flesch Reading Ease Score (FRES) for readability. RESULTS: ChatGPT-4 responses had the highest mean Likert score (4.5 ± 0.61), followed by Copilot (4.35 ± 0.81), ChatGPT-3.5 (4.15 ± 0.75) and Gemini (4.1 ± 0.72). The difference between the Likert scores of the chatbot models was not statistically significant (p > 0.05). Copilot had a significantly higher modified DISCERN and GQS score compared to both Gemini, ChatGPT-4 and ChatGPT-3.5 (p < 0.05). Gemini's modified DISCERN and GQS score was statistically higher than ChatGPT-3.5 (p < 0.05). Gemini also had a significantly higher FRES compared to both ChatGPT-4, Copilot and ChatGPT-3.5 (p < 0.05). The mean FRES was 38.39 ± 11.56 for ChatGPT-3.5, 43.88 ± 10.13 for ChatGPT-4 and 41.72 ± 10.74 for Copilot, indicating that the responses were difficult to read according to the reading level. The mean FRES for Gemini is 54.12 ± 10.27, indicating that Gemini's responses are more readable than other chatbots. CONCLUSIONS: All chatbot models provided generally accurate, moderate reliable and moderate to good quality answers to questions about the clear aligners. Furthermore, the readability of the responses was difficult. ChatGPT, Gemini and Copilot have significant potential as patient information tools in orthodontics, however, to be fully effective they need to be supplemented with more evidence-based information and improved readability.


Assuntos
Inteligência Artificial , Ortodontia , Humanos , Ortodontia/normas , Educação de Pacientes como Assunto/métodos , Educação de Pacientes como Assunto/normas , Reprodutibilidade dos Testes
14.
Telemed J E Health ; 30(3): 722-730, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37756224

RESUMO

Background: Artificial intelligence-based chatbots (AI chatbots) can potentially improve mental health care, yet factors predicting their adoption and continued use are unclear. Methods: We conducted an online survey with a sample of U.S. adults with symptoms of depression and anxiety (N = 393) in 2021 before the release of ChatGPT. We explored factors predicting the adoption and continued use of AI chatbots, including factors of the unified theory of acceptance and use of technology model, stigma, privacy concerns, and AI hesitancy. Results: Results from the regression indicated that for nonusers, performance expectancy, price value, descriptive norm, and psychological distress are positively related to the intention of adopting AI chatbots, while AI hesitancy and effort expectancy are negatively associated with adopting AI chatbots. For those with experience in using AI chatbots for mental health, performance expectancy, price value, descriptive norm, and injunctive norm are positively related to the intention of continuing to use AI chatbots. Conclusions: Understanding the adoption and continued use of AI chatbots among adults with symptoms of depression and anxiety is essential given that there is a widening gap in the supply and demand of care. AI chatbots provide new opportunities for quality care by supporting accessible, affordable, efficient, and personalized care. This study provides insights for developing and deploying AI chatbots such as ChatGPT in the context of mental health care. Findings could be used to design innovative interventions that encourage the adoption and continued use of AI chatbots among people with symptoms of depression and anxiety and who have difficulty accessing care.


Assuntos
Intenção , Saúde Mental , Adulto , Humanos , Inteligência Artificial , Privacidade , Estigma Social
15.
Int Orthop ; 48(8): 1963-1969, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38619565

RESUMO

PURPOSE: This study analyses the performance and proficiency of the three Artificial Intelligence (AI) generative chatbots (ChatGPT-3.5, ChatGPT-4.0, Bard Google AI®) and in answering the Multiple Choice Questions (MCQs) of postgraduate (PG) level orthopaedic qualifying examinations. METHODS: A series of 120 mock Single Best Answer' (SBA) MCQs with four possible options named A, B, C and D as answers on various musculoskeletal (MSK) conditions covering Trauma and Orthopaedic curricula were compiled. A standardised text prompt was used to generate and feed ChatGPT (both 3.5 and 4.0 versions) and Google Bard programs, which were then statistically analysed. RESULTS: Significant differences were found between responses from Chat GPT 3.5 with Chat GPT 4.0 (Chi square = 27.2, P < 0.001) and on comparing both Chat GPT 3.5 (Chi square = 63.852, P < 0.001) with Chat GPT 4.0 (Chi square = 44.246, P < 0.001) with. Bard Google AI® had 100% efficiency and was significantly more efficient than both Chat GPT 3.5 with Chat GPT 4.0 (p < 0.0001). CONCLUSION: The results demonstrate the variable potential of the different AI generative chatbots (Chat GPT 3.5, Chat GPT 4.0 and Bard Google) in their ability to answer the MCQ of PG-level orthopaedic qualifying examinations. Bard Google AI® has shown superior performance than both ChatGPT versions, underlining the potential of such large language processing models in processing and applying orthopaedic subspecialty knowledge at a PG level.


Assuntos
Inteligência Artificial , Educação de Pós-Graduação em Medicina , Avaliação Educacional , Ortopedia , Humanos , Ortopedia/educação , Avaliação Educacional/métodos , Educação de Pós-Graduação em Medicina/métodos , Competência Clínica , Currículo
16.
Dent Traumatol ; 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38742754

RESUMO

BACKGROUND: This study assessed the consistency and accuracy of responses provided by two artificial intelligence (AI) applications, ChatGPT and Google Bard (Gemini), to questions related to dental trauma. MATERIALS AND METHODS: Based on the International Association of Dental Traumatology guidelines, 25 dichotomous (yes/no) questions were posed to ChatGPT and Google Bard over 10 days. The responses were recorded and compared with the correct answers. Statistical analyses, including Fleiss kappa, were conducted to determine the agreement and consistency of the responses. RESULTS: Analysis of 4500 responses revealed that both applications provided correct answers to 57.5% of the questions. Google Bard demonstrated a moderate level of agreement, with varying rates of incorrect answers and referrals to physicians. CONCLUSIONS: Although ChatGPT and Google Bard are potential knowledge resources, their consistency and accuracy in responding to dental trauma queries remain limited. Further research involving specially trained AI models in endodontics is warranted to assess their suitability for clinical use.

17.
J Med Syst ; 48(1): 26, 2024 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-38411833

RESUMO

INTRODUCTION: ChatGPT, a recently released chatbot from OpenAI, has found applications in various aspects of life, including academic research. This study investigated the knowledge, perceptions, and attitudes of researchers towards using ChatGPT and other chatbots in academic research. METHODS: A pre-designed, self-administered survey using Google Forms was employed to conduct the study. The questionnaire assessed participants' knowledge of ChatGPT and other chatbots, their awareness of current chatbot and artificial intelligence (AI) applications, and their attitudes towards ChatGPT and its potential research uses. RESULTS: Two hundred researchers participated in the survey. A majority were female (57.5%), and over two-thirds belonged to the medical field (68%). While 67% had heard of ChatGPT, only 11.5% had employed it in their research, primarily for rephrasing paragraphs and finding references. Interestingly, over one-third supported the notion of listing ChatGPT as an author in scientific publications. Concerns emerged regarding AI's potential to automate researcher tasks, particularly in language editing, statistics, and data analysis. Additionally, roughly half expressed ethical concerns about using AI applications in scientific research. CONCLUSION: The increasing use of chatbots in academic research necessitates thoughtful regulation that balances potential benefits with inherent limitations and potential risks. Chatbots should not be considered authors of scientific publications but rather assistants to researchers during manuscript preparation and review. Researchers should be equipped with proper training to utilize chatbots and other AI tools effectively and ethically.


Assuntos
Inteligência Artificial , Análise de Dados , Humanos , Feminino , Masculino , Conhecimento , Idioma , Software
18.
Eur J Dent Educ ; 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38586899

RESUMO

INTRODUCTION: Interest is growing in the potential of artificial intelligence (AI) chatbots and large language models like OpenAI's ChatGPT and Google's Gemini, particularly in dental education. To explore dental educators' perceptions of AI chatbots and large language models, specifically their potential benefits and challenges for dental education. MATERIALS AND METHODS: A global cross-sectional survey was conducted in May-June 2023 using a 31-item online-questionnaire to assess dental educators' perceptions of AI chatbots like ChatGPT and their influence on dental education. Dental educators, representing diverse backgrounds, were asked about their use of AI, its perceived impact, barriers to using chatbots, and the future role of AI in this field. RESULTS: 428 dental educators (survey views = 1516; response rate = 28%) with a median [25/75th percentiles] age of 45 [37, 56] and 16 [8, 25] years of experience participated, with the majority from the Americas (54%), followed by Europe (26%) and Asia (10%). Thirty-one percent of respondents already use AI tools, with 64% recognising their potential in dental education. Perception of AI's potential impact on dental education varied by region, with Africa (4[4-5]), Asia (4[4-5]), and the Americas (4[3-5]) perceiving more potential than Europe (3[3-4]). Educators stated that AI chatbots could enhance knowledge acquisition (74.3%), research (68.5%), and clinical decision-making (63.6%) but expressed concern about AI's potential to reduce human interaction (53.9%). Dental educators' chief concerns centred around the absence of clear guidelines and training for using AI chatbots. CONCLUSION: A positive yet cautious view towards AI chatbot integration in dental curricula is prevalent, underscoring the need for clear implementation guidelines.

19.
Am J Obstet Gynecol ; 228(6): 696-705, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36924907

RESUMO

Natural language processing-the branch of artificial intelligence concerned with the interaction between computers and human language-has advanced markedly in recent years with the introduction of sophisticated deep-learning models. Improved performance in natural language processing tasks, such as text and speech processing, have fueled impressive demonstrations of these models' capabilities. Perhaps no demonstration has been more impactful to date than the introduction of the publicly available online chatbot ChatGPT in November 2022 by OpenAI, which is based on a natural language processing model known as a Generative Pretrained Transformer. Through a series of questions posed by the authors about obstetrics and gynecology to ChatGPT as prompts, we evaluated the model's ability to handle clinical-related queries. Its answers demonstrated that in its current form, ChatGPT can be valuable for users who want preliminary information about virtually any topic in the field. Because its educational role is still being defined, we must recognize its limitations. Although answers were generally eloquent, informed, and lacked a significant degree of mistakes or misinformation, we also observed evidence of its weaknesses. A significant drawback is that the data on which the model has been trained are apparently not readily updated. The specific model that was assessed here, seems to not reliably (if at all) source data from after 2021. Users of ChatGPT who expect data to be more up to date need to be aware of this drawback. An inability to cite sources or to truly understand what the user is asking suggests that it has the capability to mislead. Responsible use of models like ChatGPT will be important for ensuring that they work to help but not harm users seeking information on obstetrics and gynecology.


Assuntos
Ginecologia , Obstetrícia , Feminino , Gravidez , Humanos , Inteligência Artificial , Conscientização , Escolaridade
20.
Curr HIV/AIDS Rep ; 20(6): 481-486, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38010467

RESUMO

PURPOSE OF REVIEW: To explore the intersection of chatbots and HIV prevention and care. Current applications of chatbots in HIV services, the challenges faced, recent advancements, and future research directions are presented and discussed. RECENT FINDINGS: Chatbots facilitate sensitive discussions about HIV thereby promoting prevention and care strategies. Trustworthiness and accuracy of information were identified as primary factors influencing user engagement with chatbots. Additionally, the integration of AI-driven models that process and generate human-like text into chatbots poses both breakthroughs and challenges in terms of privacy, bias, resources, and ethical issues. Chatbots in HIV prevention and care show potential; however, significant work remains in addressing associated ethical and practical concerns. The integration of large language models into chatbots is a promising future direction for their effective deployment in HIV services. Encouraging future research, collaboration among stakeholders, and bold innovative thinking will be pivotal in harnessing the full potential of chatbot interventions.


Assuntos
Infecções por HIV , Humanos , Infecções por HIV/prevenção & controle , Privacidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA