Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Liver Int ; 44(9): 2114-2124, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38819632

RESUMEN

Large Language Models (LLMs) are transformer-based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context-based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM-oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine-Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in-context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases, where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in-context learning or SFT with RLHF could improve clinical decision-making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training.


Asunto(s)
Enfermedades del Sistema Digestivo , Redes Neurales de la Computación , Humanos , Enfermedades del Sistema Digestivo/terapia , Procesamiento de Lenguaje Natural
2.
Regul Toxicol Pharmacol ; 149: 105613, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38570021

RESUMEN

Regulatory agencies consistently deal with extensive document reviews, ranging from product submissions to both internal and external communications. Large Language Models (LLMs) like ChatGPT can be invaluable tools for these tasks, however present several challenges, particularly the proprietary information, combining customized function with specific review needs, and transparency and explainability of the model's output. Hence, a localized and customized solution is imperative. To tackle these challenges, we formulated a framework named askFDALabel on FDA drug labeling documents that is a crucial resource in the FDA drug review process. AskFDALabel operates within a secure IT environment and comprises two key modules: a semantic search and a Q&A/text-generation module. The Module S built on word embeddings to enable comprehensive semantic queries within labeling documents. The Module T utilizes a tuned LLM to generate responses based on references from Module S. As the result, our framework enabled small LLMs to perform comparably to ChatGPT with as a computationally inexpensive solution for regulatory application. To conclude, through AskFDALabel, we have showcased a pathway that harnesses LLMs to support agency operations within a secure environment, offering tailored functions for the needs of regulatory research.


Asunto(s)
Etiquetado de Medicamentos , United States Food and Drug Administration , Etiquetado de Medicamentos/normas , Etiquetado de Medicamentos/legislación & jurisprudencia , United States Food and Drug Administration/normas , Estados Unidos , Humanos
3.
J Cardiothorac Vasc Anesth ; 38(5): 1251-1259, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38423884

RESUMEN

New artificial intelligence tools have been developed that have implications for medical usage. Large language models (LLMs), such as the widely used ChatGPT developed by OpenAI, have not been explored in the context of anesthesiology education. Understanding the reliability of various publicly available LLMs for medical specialties could offer insight into their understanding of the physiology, pharmacology, and practical applications of anesthesiology. An exploratory prospective review was conducted using 3 commercially available LLMs--OpenAI's ChatGPT GPT-3.5 version (GPT-3.5), OpenAI's ChatGPT GPT-4 (GPT-4), and Google's Bard--on questions from a widely used anesthesia board examination review book. Of the 884 eligible questions, the overall correct answer rates were 47.9% for GPT-3.5, 69.4% for GPT-4, and 45.2% for Bard. GPT-4 exhibited significantly higher performance than both GPT-3.5 and Bard (p = 0.001 and p < 0.001, respectively). None of the LLMs met the criteria required to secure American Board of Anesthesiology certification, according to the 70% passing score approximation. GPT-4 significantly outperformed GPT-3.5 and Bard in terms of overall performance, but lacked consistency in providing explanations that aligned with scientific and medical consensus. Although GPT-4 shows promise, current LLMs are not sufficiently advanced to answer anesthesiology board examination questions with passing success. Further iterations and domain-specific training may enhance their utility in medical education.


Asunto(s)
Anestesiología , Humanos , Inteligencia Artificial , Estudios Prospectivos , Reproducibilidad de los Resultados , Lenguaje
4.
J Med Internet Res ; 26: e60807, 2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39052324

RESUMEN

BACKGROUND: Over the past 2 years, researchers have used various medical licensing examinations to test whether ChatGPT (OpenAI) possesses accurate medical knowledge. The performance of each version of ChatGPT on the medical licensing examination in multiple environments showed remarkable differences. At this stage, there is still a lack of a comprehensive understanding of the variability in ChatGPT's performance on different medical licensing examinations. OBJECTIVE: In this study, we reviewed all studies on ChatGPT performance in medical licensing examinations up to March 2024. This review aims to contribute to the evolving discourse on artificial intelligence (AI) in medical education by providing a comprehensive analysis of the performance of ChatGPT in various environments. The insights gained from this systematic review will guide educators, policymakers, and technical experts to effectively and judiciously use AI in medical education. METHODS: We searched the literature published between January 1, 2022, and March 29, 2024, by searching query strings in Web of Science, PubMed, and Scopus. Two authors screened the literature according to the inclusion and exclusion criteria, extracted data, and independently assessed the quality of the literature concerning Quality Assessment of Diagnostic Accuracy Studies-2. We conducted both qualitative and quantitative analyses. RESULTS: A total of 45 studies on the performance of different versions of ChatGPT in medical licensing examinations were included in this study. GPT-4 achieved an overall accuracy rate of 81% (95% CI 78-84; P<.01), significantly surpassing the 58% (95% CI 53-63; P<.01) accuracy rate of GPT-3.5. GPT-4 passed the medical examinations in 26 of 29 cases, outperforming the average scores of medical students in 13 of 17 cases. Translating the examination questions into English improved GPT-3.5's performance but did not affect GPT-4. GPT-3.5 showed no difference in performance between examinations from English-speaking and non-English-speaking countries (P=.72), but GPT-4 performed better on examinations from English-speaking countries significantly (P=.02). Any type of prompt could significantly improve GPT-3.5's (P=.03) and GPT-4's (P<.01) performance. GPT-3.5 performed better on short-text questions than on long-text questions. The difficulty of the questions affected the performance of GPT-3.5 and GPT-4. In image-based multiple-choice questions (MCQs), ChatGPT's accuracy rate ranges from 13.1% to 100%. ChatGPT performed significantly worse on open-ended questions than on MCQs. CONCLUSIONS: GPT-4 demonstrates considerable potential for future use in medical education. However, due to its insufficient accuracy, inconsistent performance, and the challenges posed by differing medical policies and knowledge across countries, GPT-4 is not yet suitable for use in medical education. TRIAL REGISTRATION: PROSPERO CRD42024506687; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=506687.


Asunto(s)
Evaluación Educacional , Licencia Médica , Humanos , Licencia Médica/normas , Licencia Médica/estadística & datos numéricos , Evaluación Educacional/métodos , Evaluación Educacional/normas , Evaluación Educacional/estadística & datos numéricos , Competencia Clínica/estadística & datos numéricos , Competencia Clínica/normas , Inteligencia Artificial , Educación Médica/normas
5.
J Med Internet Res ; 26: e52399, 2024 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-38739445

RESUMEN

BACKGROUND: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. OBJECTIVE: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. METHODS: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. RESULTS: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. CONCLUSIONS: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.


Asunto(s)
Técnica Delphi , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático , Atención a la Salud/métodos , Informática Médica/métodos
6.
J Med Internet Res ; 26: e60083, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38971715

RESUMEN

This viewpoint article first explores the ethical challenges associated with the future application of large language models (LLMs) in the context of medical education. These challenges include not only ethical concerns related to the development of LLMs, such as artificial intelligence (AI) hallucinations, information bias, privacy and data risks, and deficiencies in terms of transparency and interpretability but also issues concerning the application of LLMs, including deficiencies in emotional intelligence, educational inequities, problems with academic integrity, and questions of responsibility and copyright ownership. This paper then analyzes existing AI-related legal and ethical frameworks and highlights their limitations with regard to the application of LLMs in the context of medical education. To ensure that LLMs are integrated in a responsible and safe manner, the authors recommend the development of a unified ethical framework that is specifically tailored for LLMs in this field. This framework should be based on 8 fundamental principles: quality control and supervision mechanisms; privacy and data protection; transparency and interpretability; fairness and equal treatment; academic integrity and moral norms; accountability and traceability; protection and respect for intellectual property; and the promotion of educational research and innovation. The authors further discuss specific measures that can be taken to implement these principles, thereby laying a solid foundation for the development of a comprehensive and actionable ethical framework. Such a unified ethical framework based on these 8 fundamental principles can provide clear guidance and support for the application of LLMs in the context of medical education. This approach can help establish a balance between technological advancement and ethical safeguards, thereby ensuring that medical education can progress without compromising the principles of fairness, justice, or patient safety and establishing a more equitable, safer, and more efficient environment for medical education.


Asunto(s)
Inteligencia Artificial , Educación Médica , Educación Médica/ética , Humanos , Inteligencia Artificial/ética , Lenguaje , Privacidad
7.
J Med Internet Res ; 26: e56764, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38662419

RESUMEN

As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.


Asunto(s)
Inteligencia Artificial , Personal de Salud , Confianza , Humanos , Personal de Salud/psicología , Lenguaje , Aprendizaje
8.
J Med Internet Res ; 26: e52935, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38578685

RESUMEN

BACKGROUND: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. OBJECTIVE: The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. METHODS: Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. RESULTS: Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. CONCLUSIONS: ChatGPT's performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy.


Asunto(s)
Inteligencia Artificial , Lenguaje , Humanos , Reproducibilidad de los Resultados , Investigadores , Escritura
9.
Sensors (Basel) ; 24(11)2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38894321

RESUMEN

As modern technologies, particularly home assistant devices and sensors, become more integrated into our daily lives, they are also making their way into the domain of energy management within our homes. Homeowners, now acting as prosumers, have access to detailed information at 15-min or even 5-min intervals, including weather forecasts, outputs from renewable energy source (RES)-based systems, appliance schedules and the current energy balance, which details any deficits or surpluses along with their quantities and the predicted prices on the local energy market (LEM). The goal for these prosumers is to reduce costs while ensuring their home's comfort levels are maintained. However, given the complexity and the rapid decision-making required in managing this information, the need for a supportive system is evident. This is particularly true given the routine nature of these decisions, highlighting the potential for a system that provides personalized recommendations to optimize energy consumption, whether that involves adjusting the load or engaging in transactions with the LEM. In this context, we propose a recommendation system powered by large language models (LLMs), Scikit-llm and zero-shot classifiers, designed to evaluate specific scenarios and offer tailored advice for prosumers based on the available data at any given moment. Two scenarios for a prosumer of 5.9 kW are assessed using candidate labels, such as Decrease, Increase, Sell and Buy. A comparison with a content-based filtering system is provided considering the performance metrics that are relevant for prosumers.

10.
J Med Syst ; 48(1): 45, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38652327

RESUMEN

In medical and biomedical education, traditional teaching methods often struggle to engage students and promote critical thinking. The use of AI language models has the potential to transform teaching and learning practices by offering an innovative, active learning approach that promotes intellectual curiosity and deeper understanding. To effectively integrate AI language models into biomedical education, it is essential for educators to understand the benefits and limitations of these tools and how they can be employed to achieve high-level learning outcomes.This article explores the use of AI language models in biomedical education, focusing on their application in both classroom teaching and learning assignments. Using the SOLO taxonomy as a framework, I discuss strategies for designing questions that challenge students to exercise critical thinking and problem-solving skills, even when assisted by AI models. Additionally, I propose a scoring rubric for evaluating student performance when collaborating with AI language models, ensuring a comprehensive assessment of their learning outcomes.AI language models offer a promising opportunity for enhancing student engagement and promoting active learning in the biomedical field. Understanding the potential use of these technologies allows educators to create learning experiences that are fit for their students' needs, encouraging intellectual curiosity and a deeper understanding of complex subjects. The application of these tools will be fundamental to provide more effective and engaging learning experiences for students in the future.


Asunto(s)
Inteligencia Artificial , Humanos , Aprendizaje Basado en Problemas/métodos , Educación Médica/métodos , Evaluación Educacional/métodos
11.
Entropy (Basel) ; 26(4)2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38667881

RESUMEN

Detecting the underlying human values within arguments is essential across various domains, ranging from social sciences to recent computational approaches. Identifying these values remains a significant challenge due to their vast numbers and implicit usage in discourse. This study explores the potential of emotion analysis as a key feature in improving the detection of human values and information extraction from this field. It aims to gain insights into human behavior by applying intensive analyses of different levels of human values. Additionally, we conduct experiments that integrate extracted emotion features to improve human value detection tasks. This approach holds the potential to provide fresh insights into the complex interactions between emotions and values within discussions, offering a deeper understanding of human behavior and decision making. Uncovering these emotions is crucial for comprehending the characteristics that underlie various values through data-driven analyses. Our experiment results show improvement in the performance of human value detection tasks in many categories.

12.
Medicina (Kaunas) ; 60(3)2024 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-38541171

RESUMEN

The integration of large language models (LLMs) into healthcare, particularly in nephrology, represents a significant advancement in applying advanced technology to patient care, medical research, and education. These advanced models have progressed from simple text processors to tools capable of deep language understanding, offering innovative ways to handle health-related data, thus improving medical practice efficiency and effectiveness. A significant challenge in medical applications of LLMs is their imperfect accuracy and/or tendency to produce hallucinations-outputs that are factually incorrect or irrelevant. This issue is particularly critical in healthcare, where precision is essential, as inaccuracies can undermine the reliability of these models in crucial decision-making processes. To overcome these challenges, various strategies have been developed. One such strategy is prompt engineering, like the chain-of-thought approach, which directs LLMs towards more accurate responses by breaking down the problem into intermediate steps or reasoning sequences. Another one is the retrieval-augmented generation (RAG) strategy, which helps address hallucinations by integrating external data, enhancing output accuracy and relevance. Hence, RAG is favored for tasks requiring up-to-date, comprehensive information, such as in clinical decision making or educational applications. In this article, we showcase the creation of a specialized ChatGPT model integrated with a RAG system, tailored to align with the KDIGO 2023 guidelines for chronic kidney disease. This example demonstrates its potential in providing specialized, accurate medical advice, marking a step towards more reliable and efficient nephrology practices.


Asunto(s)
Nefrología , Humanos , Reproducibilidad de los Resultados , Escolaridad , Alucinaciones , Lenguaje
13.
Medicina (Kaunas) ; 60(1)2024 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-38256408

RESUMEN

Chain-of-thought prompting enhances the abilities of large language models (LLMs) significantly. It not only makes these models more specific and context-aware but also impacts the wider field of artificial intelligence (AI). This approach broadens the usability of AI, increases its efficiency, and aligns it more closely with human thinking and decision-making processes. As we improve this method, it is set to become a key element in the future of AI, adding more purpose, precision, and ethical consideration to these technologies. In medicine, the chain-of-thought prompting is especially beneficial. Its capacity to handle complex information, its logical and sequential reasoning, and its suitability for ethically and context-sensitive situations make it an invaluable tool for healthcare professionals. Its role in enhancing medical care and research is expected to grow as we further develop and use this technique. Chain-of-thought prompting bridges the gap between AI's traditionally obscure decision-making process and the clear, accountable standards required in healthcare. It does this by emulating a reasoning style familiar to medical professionals, fitting well into their existing practices and ethical codes. While solving AI transparency is a complex challenge, the chain-of-thought approach is a significant step toward making AI more comprehensible and trustworthy in medicine. This review focuses on understanding the workings of LLMs, particularly how chain-of-thought prompting can be adapted for nephrology's unique requirements. It also aims to thoroughly examine the ethical aspects, clarity, and future possibilities, offering an in-depth view of the exciting convergence of these areas.


Asunto(s)
Nefrología , Humanos , Inteligencia Artificial , Concienciación , Personal de Salud , Lenguaje
14.
Cas Lek Cesk ; 162(7-8): 294-297, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38981715

RESUMEN

The advent of large language models (LLMs) based on neural networks marks a significant shift in academic writing, particularly in medical sciences. These models, including OpenAI's GPT-4, Google's Bard, and Anthropic's Claude, enable more efficient text processing through transformer architecture and attention mechanisms. LLMs can generate coherent texts that are indistinguishable from human-written content. In medicine, they can contribute to the automation of literature reviews, data extraction, and hypothesis formulation. However, ethical concerns arise regarding the quality and integrity of scientific publications and the risk of generating misleading content. This article provides an overview of how LLMs are changing medical writing, the ethical dilemmas they bring, and the possibilities for detecting AI-generated text. It concludes with a focus on the potential future of LLMs in academic publishing and their impact on the medical community.


Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Lenguaje Natural , Lenguaje , Edición/ética
15.
Rheumatology (Oxford) ; 62(10): 3256-3260, 2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37307079

RESUMEN

Natural language processing (NLP), a subclass of artificial intelligence, large language models (LLMs), and its latest applications, such as Generative Pre-trained Transformers (GPT), ChatGPT, or LLAMA, have recently become one of the most discussed topics. Up to now, artificial intelligence and NLP ultimately impacted several areas, such as finance, economics and diagnostic/scoring systems in healthcare. Another area that artificial intelligence has affected and will continue to affect increasingly is academic life. This narrative review will define NLP, LLMs and their applications, discuss the opportunities and challenges that components of academic society will experience in rheumatology, and discuss the impact of NLP and LLMs in rheumatology healthcare.


Asunto(s)
Reumatólogos , Reumatología , Humanos , Inteligencia Artificial , Procesamiento de Lenguaje Natural
16.
J Med Internet Res ; 25: e48659, 2023 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-37606976

RESUMEN

BACKGROUND: Large language model (LLM)-based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. OBJECTIVE: This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. METHODS: We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks. RESULTS: ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (ß=-15.8%; P<.001) and clinical management (ß=-7.4%; P=.02) question types. CONCLUSIONS: ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set.


Asunto(s)
Inteligencia Artificial , Humanos , Toma de Decisiones Clínicas , Organizaciones , Flujo de Trabajo , Diseño Centrado en el Usuario
17.
J Med Internet Res ; 25: e50638, 2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37792434

RESUMEN

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.


Asunto(s)
Inteligencia Artificial , Ingeniería , Humanos , Personal de Salud , Lenguaje
18.
Radiol Med ; 128(7): 808-812, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37248403

RESUMEN

Structured reporting may improve the radiological workflow and communication among physicians. Artificial intelligence applications in medicine are growing fast. Large language models (LLMs) are recently gaining importance as valuable tools in radiology and are currently being tested for the critical task of structured reporting. We compared four LLMs models in terms of knowledge on structured reporting and templates proposal. LLMs hold a great potential for generating structured reports in radiology but additional formal validations are needed on this topic.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiografía , Lenguaje , Comunicación
19.
Educ Inf Technol (Dordr) ; 26(5): 6097-6108, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34104069

RESUMEN

The aim of this study was to analyze whether the self-evaluation online teaching effectiveness (SEOTE) and literacy of learning manage system (LLMS) did significantly have effect on the level of self-directed learning readiness (SDLR). Furthermore, it was another purpose to examine whether the LLMS was a significant mediating effect between SEOTE and SDLR. Pearson correlation, multiple linear regression, and mediated regression analysis were conducted for this study. This study included 210 online college students in Korea who responded to three web-survey questionnaires (SEOTE, LLMS, and SDLR). The bivariate (Pearson) correlation analysis showed that SEOTE significantly influenced on SDLR (r=524, p<.01) and LLMS was positively associated with SDLR (r=487, p<.01). The multiple linear regression revealed that SEOTE and LLMS significantly predicted SDLR and explained 31.4% of variance in SDLR. The results of mediated regression analysis revealed that LLMS did have a significant indirect or mediating effect with SEOTE in predicting SDLR.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA