Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 213
Filtrar
1.
Proc Natl Acad Sci U S A ; 121(35): e2404328121, 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39163339

RESUMO

How good a research scientist is ChatGPT? We systematically probed the capabilities of GPT-3.5 and GPT-4 across four central components of the scientific process: as a Research Librarian, Research Ethicist, Data Generator, and Novel Data Predictor, using psychological science as a testing field. In Study 1 (Research Librarian), unlike human researchers, GPT-3.5 and GPT-4 hallucinated, authoritatively generating fictional references 36.0% and 5.4% of the time, respectively, although GPT-4 exhibited an evolving capacity to acknowledge its fictions. In Study 2 (Research Ethicist), GPT-4 (though not GPT-3.5) proved capable of detecting violations like p-hacking in fictional research protocols, correcting 88.6% of blatantly presented issues, and 72.6% of subtly presented issues. In Study 3 (Data Generator), both models consistently replicated patterns of cultural bias previously discovered in large language corpora, indicating that ChatGPT can simulate known results, an antecedent to usefulness for both data generation and skills like hypothesis generation. Contrastingly, in Study 4 (Novel Data Predictor), neither model was successful at predicting new results absent in their training data, and neither appeared to leverage substantially new information when predicting more vs. less novel outcomes. Together, these results suggest that GPT is a flawed but rapidly improving librarian, a decent research ethicist already, capable of data generation in simple domains with known characteristics but poor at predicting novel patterns of empirical data to aid future experimentation.


Assuntos
Bibliotecários , Humanos , Eticistas , Pesquisadores , Ética em Pesquisa
2.
Proc Natl Acad Sci U S A ; 121(21): e2314021121, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38722813

RESUMO

Generative AI that can produce realistic text, images, and other human-like outputs is currently transforming many different industries. Yet it is not yet known how such tools might influence social science research. I argue Generative AI has the potential to improve survey research, online experiments, automated content analyses, agent-based models, and other techniques commonly used to study human behavior. In the second section of this article, I discuss the many limitations of Generative. I examine how bias in the data used to train these tools can negatively impact social science research-as well as a range of other challenges related to ethics, replication, environmental impact, and the proliferation of low-quality research. I conclude by arguing that social scientists can address many of these limitations by creating open-source infrastructure for research on human behavior. Such infrastructure is not only necessary to ensure broad access to high-quality research tools, I argue, but also because the progress of AI will require deeper understanding of the social forces that guide human behavior.


Assuntos
Inteligência Artificial , Ciências Sociais , Humanos
3.
Proc Natl Acad Sci U S A ; 121(18): e2307304121, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38640257

RESUMO

Over the past few years, machine learning models have significantly increased in size and complexity, especially in the area of generative AI such as large language models. These models require massive amounts of data and compute capacity to train, to the extent that concerns over the training data (such as protected or private content) cannot be practically addressed by retraining the model "from scratch" with the questionable data removed or altered. Furthermore, despite significant efforts and controls dedicated to ensuring that training corpora are properly curated and composed, the sheer volume required makes it infeasible to manually inspect each datum comprising a training corpus. One potential approach to training corpus data defects is model disgorgement, by which we broadly mean the elimination or reduction of not only any improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible use of intellectual property. In this paper, we survey the landscape of model disgorgement methods and introduce a taxonomy of disgorgement techniques that are applicable to modern ML systems. In particular, we investigate the various meanings of "removing the effects" of data on the trained model in a way that does not require retraining from scratch.


Assuntos
Idioma , Aprendizado de Máquina
4.
Proc Natl Acad Sci U S A ; 120(41): e2311627120, 2023 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-37788311

RESUMO

Political discourse is the soul of democracy, but misunderstanding and conflict can fester in divisive conversations. The widespread shift to online discourse exacerbates many of these problems and corrodes the capacity of diverse societies to cooperate in solving social problems. Scholars and civil society groups promote interventions that make conversations less divisive or more productive, but scaling these efforts to online discourse is challenging. We conduct a large-scale experiment that demonstrates how online conversations about divisive topics can be improved with AI tools. Specifically, we employ a large language model to make real-time, evidence-based recommendations intended to improve participants' perception of feeling understood. These interventions improve reported conversation quality, promote democratic reciprocity, and improve the tone, without systematically changing the content of the conversation or moving people's policy attitudes.


Assuntos
Idioma , Políticas , Humanos
5.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38168838

RESUMO

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.


Assuntos
Armazenamento e Recuperação da Informação , Idioma , Humanos , Privacidade , Pesquisadores
6.
Neuroimage ; 296: 120663, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38843963

RESUMO

INTRODUCTION: Timely diagnosis and prognostication of Alzheimer's disease (AD) and mild cognitive impairment (MCI) are pivotal for effective intervention. Artificial intelligence (AI) in neuroradiology may aid in such appropriate diagnosis and prognostication. This study aimed to evaluate the potential of novel diffusion model-based AI for enhancing AD and MCI diagnosis through superresolution (SR) of brain magnetic resonance (MR) images. METHODS: 1.5T brain MR scans of patients with AD or MCI and healthy controls (NC) from Alzheimer's Disease Neuroimaging Initiative 1 (ADNI1) were superresolved to 3T using a novel diffusion model-based generative AI (d3T*) and a convolutional neural network-based model (c3T*). Comparisons of image quality to actual 1.5T and 3T MRI were conducted based on signal-to-noise ratio (SNR), naturalness image quality evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). Voxel-based volumetric analysis was then conducted to study whether 3T* images offered more accurate volumetry than 1.5T images. Binary and multiclass classifications of AD, MCI, and NC were conducted to evaluate whether 3T* images offered superior AD classification performance compared to actual 1.5T MRI. Moreover, CNN-based classifiers were used to predict conversion of MCI to AD, to evaluate the prognostication performance of 3T* images. The classification performances were evaluated using accuracy, sensitivity, specificity, F1 score, Matthews correlation coefficient (MCC), and area under the receiver-operating curves (AUROC). RESULTS: Analysis of variance (ANOVA) detected significant differences in image quality among the 1.5T, c3T*, d3T*, and 3T groups across all metrics. Both c3T* and d3T* showed superior image quality compared to 1.5T MRI in NIQE and BRISQUE with statistical significance. While the hippocampal volumes measured in 3T* and 3T images were not significantly different, the hippocampal volume measured in 1.5T images showed significant difference. 3T*-based AD classifications showed superior performance across all performance metrics compared to 1.5T-based AD classification. Classification performance between d3T* and actual 3T was not significantly different. 3T* images offered superior accuracy in predicting the conversion of MCI to AD than 1.5T images did. CONCLUSIONS: The diffusion model-based MRI SR enhances the resolution of brain MR images, significantly improving diagnostic and prognostic accuracy for AD and MCI. Superresolved 3T* images closely matched actual 3T MRIs in quality and volumetric accuracy, and notably improved the prediction performance of conversion from MCI to AD.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/classificação , Disfunção Cognitiva/diagnóstico por imagem , Disfunção Cognitiva/classificação , Idoso , Feminino , Masculino , Prognóstico , Idoso de 80 Anos ou mais , Inteligência Artificial , Imageamento por Ressonância Magnética/métodos , Interpretação de Imagem Assistida por Computador/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Pessoa de Meia-Idade , Imagem de Difusão por Ressonância Magnética/métodos , Neuroimagem/métodos , Neuroimagem/normas
7.
Ecol Lett ; 27(3): e14397, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38430051

RESUMO

Generative artificial intelligence (AI) models will have broad impacts on society including the scientific enterprise; ecology and environmental science will be no exception. Here, we discuss the potential opportunities and risks of advanced generative AI for visual material (images and video) for the science of ecology and the environment itself. There are clearly opportunities for positive impacts, related to improved communication, for example; we also see possibilities for ecological research to benefit from generative AI (e.g., image gap filling, biodiversity surveys, and improved citizen science). However, there are also risks, threatening to undermine the credibility of our science, mostly related to actions of bad actors, for example in terms of spreading fake information or committing fraud. Risks need to be mitigated at the level of government regulatory measures, but we also highlight what can be done right now, including discussing issues with the next generation of ecologists and transforming towards radically open science workflows.


Assuntos
Inteligência Artificial , Biodiversidade
8.
Lab Invest ; 104(8): 102095, 2024 08.
Artigo em Inglês | MEDLINE | ID: mdl-38925488

RESUMO

In our rapidly expanding landscape of artificial intelligence, synthetic data have become a topic of great promise and also some concern. This review aimed to provide pathologists and laboratory professionals with a primer on the role of synthetic data and how it may soon shape the landscape within our field. Using synthetic data presents many advantages but also introduces a milieu of new obstacles and limitations. This review aimed to provide pathologists and laboratory professionals with a primer on the general concept of synthetic data and its potential to transform our field. By leveraging synthetic data, we can help accelerate the development of various machine learning models and enhance our medical education and research/quality study needs. This review explored the methods for generating synthetic data, including rule-based, machine learning model-based and hybrid approaches, as they apply to applications within pathology and laboratory medicine. We also discussed the limitations and challenges associated with such synthetic data, including data quality, malicious use, and ethical bias/concerns and challenges. By understanding the potential benefits (ie, medical education, training artificial intelligence programs, and proficiency testing, etc) and limitations of this new data realm, we can not only harness its power to improve patient outcomes, advance research, and enhance the practice of pathology but also become readily aware of their intrinsic limitations.


Assuntos
Aprendizado de Máquina , Humanos , Patologia , Inteligência Artificial
9.
Ann Surg Oncol ; 31(10): 6387-6393, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38909113

RESUMO

BACKGROUND: Few studies have examined the performance of artificial intelligence (AI) content detection in scientific writing. This study evaluates the performance of publicly available AI content detectors when applied to both human-written and AI-generated scientific articles. METHODS: Articles published in Annals of Surgical Oncology (ASO) during the year 2022, as well as AI-generated articles using OpenAI's ChatGPT, were analyzed by three AI content detectors to assess the probability of AI-generated content. Full manuscripts and their individual sections were evaluated. Group comparisons and trend analyses were conducted by using ANOVA and linear regression. Classification performance was determined using area under the curve (AUC). RESULTS: A total of 449 original articles met inclusion criteria and were evaluated to determine the likelihood of being generated by AI. Each detector also evaluated 47 AI-generated articles by using titles from ASO articles. Human-written articles had an average probability of being AI-generated of 9.4% with significant differences between the detectors. Only two (0.4%) human-written manuscripts were detected as having a 0% probability of being AI-generated by all three detectors. Completely AI-generated articles were evaluated to have a higher average probability of being AI-generated (43.5%) with a range from 12.0 to 99.9%. CONCLUSIONS: This study demonstrates differences in the performance of various AI content detectors with the potential to label human-written articles as AI-generated. Any effort toward implementing AI detectors must include a strategy for continuous evaluation and validation as AI models and detectors rapidly evolve.


Assuntos
Inteligência Artificial , Humanos , Redação , Oncologia Cirúrgica
10.
Br J Clin Pharmacol ; 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39191671

RESUMO

AIMS: The aim of this study was to assess the ChatGPT-4 (ChatGPT) large language model (LLM) on tasks relevant to community pharmacy. METHODS: ChatGPT was assessed with community pharmacy-relevant test cases involving drug information retrieval, identifying labelling errors, prescription interpretation, decision-making under uncertainty and multidisciplinary consults. Drug information on rituximab, warfarin, and St. John's wort was queried. The decision-support scenarios consisted of a subject with swollen eyelids and a maculopapular rash in a subject on lisinopril and ferrous sulfate. The multidisciplinary scenarios required the integration of medication management with recommendations for healthy eating and physical activity/exercise. RESULTS: The responses from ChatGPT for rituximab, warfarin, and St. John's wort were satisfactory and cited drug databases and drug-specific monographs. ChatGPT identified labeling errors related to incorrect medication strength, form, route of administration, unit conversion, and directions. For the patient with inflamed eyelids, the course of action developed by ChatGPT was comparable to the pharmacist's approach. For the patient with the maculopapular rash, both the pharmacist and ChatGPT placed a drug reaction to either lisinopril or ferrous sulfate at the top of the differential. ChatGPT provided customized vaccination requirements for travel to Brazil, guidance on management of drug allergies and recovery from a knee injury. ChatGPT provided satisfactory medication management and wellness information for a diabetic on metformin and semaglutide. CONCLUSIONS: LLMs have the potential to become a powerful tool in community pharmacy. However, rigorous validation studies across diverse pharmacist queries, drug classes and populations, and engineering to secure patient privacy will be needed to enhance LLM utility.

11.
Artigo em Inglês | MEDLINE | ID: mdl-39243338

RESUMO

PURPOSE OF REVIEW: The integration of digital technology into medical practice is often thrust upon clinicians, with standards and routines developed long after initiation. Clinicians should endeavor towards a basic understanding even of emerging technologies so that they can direct its use. The intent of this review is to describe the current state of rapidly evolving generative artificial intelligence (GAI), and to explore both how pediatric gastroenterology practice may benefit as well as challenges that will be faced. RECENT FINDINGS: Although little research demonstrating the acceptance, practice, and outcomes associated with GAI in pediatric gastroenterology is published, there are relevant data adjacent to the specialty and overwhelming potential as professed in the media. Best practice guidelines are widely developed in academic publishing and resources to initiate and improve practical user skills are prevalent. Initial published evidence supports broad acceptance of the technology as part of medical practice by clinicians and patients, describes methods with which higher quality GAI can be developed, and identifies the potential for bias and disparities resulting from its use. GAI is broadly available as a digital tool for incorporation into medical practice and holds promise for improved quality and efficiency of care, but investigation into how GAI can best be used remains at an early stage despite rapid evolution of the technology.

12.
J Biomed Inform ; 156: 104662, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38880236

RESUMO

BACKGROUND: Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information. METHODOLOGY: We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model's output of each task manually against a gold standard dataset. RESULT: The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs' clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided. CONCLUSION: This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.


Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação/métodos , Desnutrição , Algoritmos , Austrália
13.
J Biomed Inform ; 153: 104640, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.


Assuntos
Inteligência Artificial , Medicina Baseada em Evidências , Humanos , Confiança , Processamento de Linguagem Natural
14.
J Biomed Inform ; 157: 104702, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39084480

RESUMO

Although rare diseases individually have a low prevalence, they collectively affect nearly 400 million individuals around the world. On average, it takes five years for an accurate rare disease diagnosis, but many patients remain undiagnosed or misdiagnosed. As machine learning technologies have been used to aid diagnostics in the past, this study aims to test ChatGPT's suitability for rare disease diagnostic support with the enhancement provided by Retrieval Augmented Generation (RAG). RareDxGPT, our enhanced ChatGPT model, supplies ChatGPT with information about 717 rare diseases from an external knowledge resource, the RareDis Corpus, through RAG. In RareDxGPT, when a query is entered, the three documents most relevant to the query in the RareDis Corpus are retrieved. Along with the query, they are returned to ChatGPT to provide a diagnosis. Additionally, phenotypes for thirty different diseases were extracted from free text from PubMed's Case Reports. They were each entered with three different prompt types: "prompt", "prompt + explanation" and "prompt + role play." The accuracy of ChatGPT and RareDxGPT with each prompt was then measured. With "Prompt", RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 37 % of the cases correct. With "Prompt + Explanation", RareDxGPT had a 43 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. With "Prompt + Role Play", RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. To conclude, ChatGPT, especially when supplying extra domain specific knowledge, demonstrates early potential for rare disease diagnosis with adjustments.


Assuntos
Aprendizado de Máquina , Doenças Raras , Doenças Raras/diagnóstico , Humanos , Armazenamento e Recuperação da Informação/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Algoritmos , Diagnóstico por Computador/métodos
15.
Blood Purif ; : 1-13, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39217985

RESUMO

BACKGROUND: Generative artificial intelligence (AI) is rapidly transforming various aspects of healthcare, including critical care nephrology. Large language models (LLMs), a key technology in generative AI, show promise in enhancing patient care, streamlining workflows, and advancing research in this field. SUMMARY: This review analyzes the current applications and future prospects of generative AI in critical care nephrology. Recent studies demonstrate the capabilities of LLMs in diagnostic accuracy, clinical reasoning, and continuous renal replacement therapy (CRRT) alarm troubleshooting. As we enter an era of multiagent models and automation, the integration of generative AI into critical care nephrology holds promise for improving patient care, optimizing clinical processes, and accelerating research. However, careful consideration of ethical implications and continued refinement of these technologies are essential for their responsible implementation in clinical practice. This review explores the current and potential applications of generative AI in nephrology, focusing on clinical decision support, patient education, research, and medical education. Additionally, we examine the challenges and limitations of AI implementation, such as privacy concerns, potential bias, and the necessity for human oversight. KEY MESSAGES: (i) LLMs have shown potential in enhancing diagnostic accuracy, clinical reasoning, and CRRT alarm troubleshooting in critical care nephrology. (ii) Generative AI offers promising applications in patient education, literature review, and academic writing within the field of nephrology. (iii) The integration of AI into electronic health records and clinical workflows presents both opportunities and challenges for improving patient care and research. (iv) Addressing ethical concerns, ensuring data privacy, and maintaining human oversight are crucial for the responsible implementation of AI in critical care nephrology.

16.
Am J Bioeth ; 24(7): 13-26, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38226965

RESUMO

When making substituted judgments for incapacitated patients, surrogates often struggle to guess what the patient would want if they had capacity. Surrogates may also agonize over having the (sole) responsibility of making such a determination. To address such concerns, a Patient Preference Predictor (PPP) has been proposed that would use an algorithm to infer the treatment preferences of individual patients from population-level data about the known preferences of people with similar demographic characteristics. However, critics have suggested that even if such a PPP were more accurate, on average, than human surrogates in identifying patient preferences, the proposed algorithm would nevertheless fail to respect the patient's (former) autonomy since it draws on the 'wrong' kind of data: namely, data that are not specific to the individual patient and which therefore may not reflect their actual values, or their reasons for having the preferences they do. Taking such criticisms on board, we here propose a new approach: the Personalized Patient Preference Predictor (P4). The P4 is based on recent advances in machine learning, which allow technologies including large language models to be more cheaply and efficiently 'fine-tuned' on person-specific data. The P4, unlike the PPP, would be able to infer an individual patient's preferences from material (e.g., prior treatment decisions) that is in fact specific to them. Thus, we argue, in addition to being potentially more accurate at the individual level than the previously proposed PPP, the predictions of a P4 would also more directly reflect each patient's own reasons and values. In this article, we review recent discoveries in artificial intelligence research that suggest a P4 is technically feasible, and argue that, if it is developed and appropriately deployed, it should assuage some of the main autonomy-based concerns of critics of the original PPP. We then consider various objections to our proposal and offer some tentative replies.


Assuntos
Julgamento , Preferência do Paciente , Humanos , Autonomia Pessoal , Algoritmos , Aprendizado de Máquina/ética , Tomada de Decisões/ética
17.
J Pharmacokinet Pharmacodyn ; 51(3): 187-197, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38656706

RESUMO

To assess ChatGPT 4.0 (ChatGPT) and Gemini Ultra 1.0 (Gemini) large language models on NONMEM coding tasks relevant to pharmacometrics and clinical pharmacology. ChatGPT and Gemini were assessed on tasks mimicking real-world applications of NONMEM. The tasks ranged from providing a curriculum for learning NONMEM, an overview of NONMEM code structure to generating code. Prompts in lay language to elicit NONMEM code for a linear pharmacokinetic (PK) model with oral administration and a more complex model with two parallel first-order absorption mechanisms were investigated. Reproducibility and the impact of "temperature" hyperparameter settings were assessed. The code was reviewed by two NONMEM experts. ChatGPT and Gemini provided NONMEM curriculum structures combining foundational knowledge with advanced concepts (e.g., covariate modeling and Bayesian approaches) and practical skills including NONMEM code structure and syntax. ChatGPT provided an informative summary of the NONMEM control stream structure and outlined the key NONMEM Translator (NM-TRAN) records needed. ChatGPT and Gemini were able to generate code blocks for the NONMEM control stream from the lay language prompts for the two coding tasks. The control streams contained focal structural and syntax errors that required revision before they could be executed without errors and warnings. The code output from ChatGPT and Gemini was not reproducible, and varying the temperature hyperparameter did not reduce the errors and omissions substantively. Large language models may be useful in pharmacometrics for efficiently generating an initial coding template for modeling projects. However, the output can contain errors and omissions that require correction.


Assuntos
Teorema de Bayes , Humanos , Farmacocinética , Modelos Biológicos , Reprodutibilidade dos Testes , Software , Farmacologia Clínica/métodos , Dinâmica não Linear , Simulação por Computador
18.
J Med Internet Res ; 26: e59505, 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39321458

RESUMO

In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data-driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.


Assuntos
Atenção à Saúde , Humanos , Atenção à Saúde/tendências , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde
19.
J Med Internet Res ; 26: e56655, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630520

RESUMO

BACKGROUND: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. OBJECTIVE: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test-related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. METHODS: We collected laboratory test result-related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. RESULTS: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4-generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one's medical context, incorrect statements, and lack of references. CONCLUSIONS: By evaluating LLMs in generating responses to patients' laboratory test result-related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4's responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation.


Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Humanos , Idioma
20.
J Med Internet Res ; 26: e53008, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38457208

RESUMO

As advances in artificial intelligence (AI) continue to transform and revolutionize the field of medicine, understanding the potential uses of generative AI in health care becomes increasingly important. Generative AI, including models such as generative adversarial networks and large language models, shows promise in transforming medical diagnostics, research, treatment planning, and patient care. However, these data-intensive systems pose new threats to protected health information. This Viewpoint paper aims to explore various categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support, while identifying security and privacy threats within each phase of the life cycle of such systems (ie, data collection, model development, and implementation phases). The objectives of this study were to analyze the current state of generative AI in health care, identify opportunities and privacy and security challenges posed by integrating these technologies into existing health care infrastructure, and propose strategies for mitigating security and privacy risks. This study highlights the importance of addressing the security and privacy threats associated with generative AI in health care to ensure the safe and effective use of these systems. The findings of this study can inform the development of future generative AI systems in health care and help health care organizations better understand the potential benefits and risks associated with these systems. By examining the use cases and benefits of generative AI across diverse domains within health care, this paper contributes to theoretical discussions surrounding AI ethics, security vulnerabilities, and data privacy regulations. In addition, this study provides practical insights for stakeholders looking to adopt generative AI solutions within their organizations.


Assuntos
Inteligência Artificial , Pesquisa Biomédica , Humanos , Privacidade , Coleta de Dados , Idioma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA