Pesquisa | BVS Integralidade em Saúde

1.

Harnessing large language models' zero-shot and few-shot learning capabilities for regulatory research.

Meshkin, Hamed; Zirkle, Joel; Arabidarrehdor, Ghazal; Chaturbedi, Anik; Chakravartula, Shilpa; Mann, John; Thrasher, Bradlee; Li, Zhihua.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39177261

RESUMO

Large language models (LLMs) are sophisticated AI-driven models trained on vast sources of natural language data. They are adept at generating responses that closely mimic human conversational patterns. One of the most notable examples is OpenAI's ChatGPT, which has been extensively used across diverse sectors. Despite their flexibility, a significant challenge arises as most users must transmit their data to the servers of companies operating these models. Utilizing ChatGPT or similar models online may inadvertently expose sensitive information to the risk of data breaches. Therefore, implementing LLMs that are open source and smaller in scale within a secure local network becomes a crucial step for organizations where ensuring data privacy and protection has the highest priority, such as regulatory agencies. As a feasibility evaluation, we implemented a series of open-source LLMs within a regulatory agency's local network and assessed their performance on specific tasks involving extracting relevant clinical pharmacology information from regulatory drug labels. Our research shows that some models work well in the context of few- or zero-shot learning, achieving performance comparable, or even better than, neural network models that needed thousands of training samples. One of the models was selected to address a real-world issue of finding intrinsic factors that affect drugs' clinical exposure without any training or fine-tuning. In a dataset of over 700 000 sentences, the model showed a 78.5% accuracy rate. Our work pointed to the possibility of implementing open-source LLMs within a secure local network and using these models to perform various natural language processing tasks when large numbers of training examples are unavailable.

Assuntos

Processamento de Linguagem Natural , Humanos , Redes Neurais de Computação , Aprendizado de Máquina

2.

Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews.

Matsui, Kentaro; Utsumi, Tomohiro; Aoki, Yumi; Maruki, Taku; Takeshima, Masahiro; Takaesu, Yoshikazu.

J Med Internet Res ; 26: e52758, 2024 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-39151163

RESUMO

BACKGROUND: The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers. OBJECTIVE: We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records. METHODS: We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study's inclusion criteria and optimization for screening were carried out using a GPT-4-based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included. RESULTS: On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria. CONCLUSIONS: Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility.

Assuntos

Revisões Sistemáticas como Assunto , Humanos , Idioma

3.

Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study.

Mishra, Vishala; Sarraju, Ashish; Kalwani, Neil M; Dexter, Joseph P.

J Med Internet Res ; 26: e55388, 2024 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-38648104

RESUMO

In this cross-sectional study, we evaluated the completeness, readability, and syntactic complexity of cardiovascular disease prevention information produced by GPT-4 in response to 4 kinds of prompts.

Assuntos

Doenças Cardiovasculares , Estudos Transversais , Humanos , Idioma

4.

Explicit Image Caption Reasoning: Generating Accurate and Informative Captions for Complex Scenes with LMM.

Cui, Mingzhang; Li, Caihong; Yang, Yi.

Sensors (Basel) ; 24(12)2024 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-38931605

RESUMO

The rapid advancement of sensor technologies and deep learning has significantly advanced the field of image captioning, especially for complex scenes. Traditional image captioning methods are often unable to handle the intricacies and detailed relationships within complex scenes. To overcome these limitations, this paper introduces Explicit Image Caption Reasoning (ECR), a novel approach that generates accurate and informative captions for complex scenes captured by advanced sensors. ECR employs an enhanced inference chain to analyze sensor-derived images, examining object relationships and interactions to achieve deeper semantic understanding. We implement ECR using the optimized ICICD dataset, a subset of the sensor-oriented Flickr30K-EE dataset containing comprehensive inference chain information. This dataset enhances training efficiency and caption quality by leveraging rich sensor data. We create the Explicit Image Caption Reasoning Multimodal Model (ECRMM) by fine-tuning TinyLLaVA with the ICICD dataset. Experiments demonstrate ECR's effectiveness and robustness in processing sensor data, outperforming traditional methods.

5.

A Dual-Stream Cross AGFormer-GPT Network for Traffic Flow Prediction Based on Large-Scale Road Sensor Data.

Sun, Yu; Shi, Yajing; Jia, Kaining; Zhang, Zhiyuan; Qin, Li.

Sensors (Basel) ; 24(12)2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38931689

RESUMO

Traffic flow prediction can provide important reference data for managers to maintain traffic order, and can also be based on personal travel plans for optimal route selection. On account of the development of sensors and data collection technology, large-scale road network historical data can be effectively used, but their high non-linearity makes it meaningful to establish effective prediction models. In this regard, this paper proposes a dual-stream cross AGFormer-GPT network with prompt engineering for traffic flow prediction, which integrates traffic occupancy and speed as two prompts into traffic flow in the form of cross-attention, and uniquely mines spatial correlation and temporal correlation information through the dual-stream cross structure, effectively combining the advantages of the adaptive graph neural network and large language model to improve prediction accuracy. The experimental results on two PeMS road network data sets have verified that the model has improved by about 1.2% in traffic prediction accuracy under different road networks.

6.

Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.

Demir, Gizem Boztas; Süküt, Yagizalp; Duran, Gökhan Serhat; Topsakal, Kübra Gülnur; Görgülü, Serkan.

Eur J Orthod ; 46(2)2024 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-38452222

RESUMO

OBJECTIVES: The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics. MATERIALS/METHODS: Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models. RESULTS: ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5. LIMITATIONS: The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models. CONCLUSIONS/IMPLICATIONS: Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.

Assuntos

Assistência Odontológica , Revisões Sistemáticas como Assunto , Humanos

7.

Research on a Framework for Chinese Argot Recognition and Interpretation by Integrating Improved MECT Models.

Li, Mingfeng; Li, Xin; Hu, Mianning; Yuan, Deyu.

Entropy (Basel) ; 26(4)2024 Apr 06.

Artigo em Inglês | MEDLINE | ID: mdl-38667875

RESUMO

In underground industries, practitioners frequently employ argots to communicate discreetly and evade surveillance by investigative agencies. Proposing an innovative approach using word vectors and large language models, we aim to decipher and understand the myriad of argots in these industries, providing crucial technical support for law enforcement to detect and combat illicit activities. Specifically, positional differences in semantic space distinguish argots, and pre-trained language models' corpora are crucial for interpreting them. Expanding on these concepts, the article assesses the semantic coherence of word vectors in the semantic space based on the concept of information entropy. Simultaneously, we devised a labeled argot dataset, MNGG, and developed an argot recognition framework named CSRMECT, along with an argot interpretation framework called LLMResolve. These frameworks leverage the MECT model, the large language model, prompt engineering, and the DBSCAN clustering algorithm. Experimental results demonstrate that the CSRMECT framework outperforms the current optimal model by 10% in terms of the F1 value for argot recognition on the MNGG dataset, while the LLMResolve framework achieves a 4% higher accuracy in interpretation compared to the current optimal model.The related experiments undertaken also indicate a potential correlation between vector information entropy and model performance.

8.

Prompt Pattern Engineering for Test Question Mapping using ChatGPT: A cross-sectional study.

Babin, Jennifer L; Raber, Hanna; Mattingly Ii, T Joseph.

Am J Pharm Educ ; : 101266, 2024 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-39153573

RESUMO

OBJECTIVE: This study aimed to develop a prompt engineering procedure for test question mapping and then determine the effectiveness of test question mapping using ChatGPT compared to human faculty mapping. METHODS: We conducted a cross-sectional study to compare ChatGPT and human mapping using a sample of 139 test questions from modules within an integrated pharmacotherapeutics course series. The test questions were mapped by three faculty members to both module objectives and the Accreditation Council for Pharmacy Education Standards 2016 (Standards 2016) to create the "correct answer". Prompt engineering procedures were created to facilitate mapping with ChatGPT, and ChatGPT mapping results were compared with human mapping. RESULTS: ChatGPT mapped test questions directly to the "correct answer" based on human consensus in 68.0% of cases, and the program matched with at least one individual human response in another 20.1% of cases for a total of 88.1% agreement with human mappers. When humans fully agreed with the mapping decision, ChatGPT was more likely to map correctly. CONCLUSION: This study presents a practical use case with prompt engineering tailored for college assessment or curriculum committees to facilitate efficient test question and educational outcomes mapping.

9.

ChatGPT's Role in Gerontology Research.

Kaufmann, Christopher N; Bai, Chen; Borgia, Brianne; Leeuwenburgh, Christiaan; Lin, Yi; Mardini, Mamoun T; McElroy, Taylor; Swanson, Clayton W; Wimberly, Keon D; Zapata, Ruben; Zeidan, Rola S; Manini, Todd M.

J Gerontol A Biol Sci Med Sci ; 2024 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-39046716

RESUMO

BACKGROUND: ChatGPT and other ChatBots have emerged as tools for interacting with information in manners resembling natural human speech. Consequently, the technology is used across various disciplines, including business, education, and even in biomedical sciences. There is a need to better understand how ChatGPT can be used to advance gerontology research. Therefore, we evaluated ChatGPT responses to questions on specific topics in gerontology research, and brainstormed recommendations for its use in the field. METHODS: We conducted semi-structured brainstorming sessions to identify uses of ChatGPT in gerontology research. We divided a team of multidisciplinary researchers into four topical groups: a) gero-clinical science, b) basic geroscience, c) informatics as it relates to electronic health records (EHR), and d) gero-technology. Each group prompted ChatGPT on a theory-, methods-, and interpretation-based question and rated responses for accuracy and completeness based on standardized scales. RESULTS: ChatGPT responses were rated by all groups as generally accurate. However, the completeness of responses was rated lower, except by members of the informatics group, who rated responses as highly comprehensive. CONCLUSIONS: ChatGPT accurately depicts some major concepts in gerontological research. However, researchers have an important role in critically appraising the completeness of its responses. Having a single generalized resource like ChatGPT may help summarize the preponderance of evidence in the field to identify gaps in knowledge and promote cross-disciplinary collaboration.

10.

Crafting medical MCQs with generative AI: A how-to guide on leveraging ChatGPT.

Stadler, Matthias; Horrer, Anna; Fischer, Martin R.

GMS J Med Educ ; 41(2): Doc20, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38779693

RESUMO

As medical educators grapple with the consistent demand for high-quality assessments, the integration of artificial intelligence presents a novel solution. This how-to article delves into the mechanics of employing ChatGPT for generating Multiple Choice Questions (MCQs) within the medical curriculum. Focusing on the intricacies of prompt engineering, we elucidate the steps and considerations imperative for achieving targeted, high-fidelity results. The article presents varying outcomes based on different prompt structures, highlighting the AI's adaptability in producing questions of distinct complexities. While emphasizing the transformative potential of ChatGPT, we also spotlight challenges, including the AI's occasional "hallucination", underscoring the importance of rigorous review. This guide aims to furnish educators with the know-how to integrate AI into their assessment creation process, heralding a new era in medical education tools.

Assuntos

Inteligência Artificial , Currículo , Educação Médica , Avaliação Educacional , Humanos , Educação Médica/métodos , Avaliação Educacional/métodos

11.

Prompt engineering: The next big skill in rheumatology research.

Venerito, Vincenzo; Lalwani, Devansh; Del Vescovo, Sergio; Iannone, Florenzo; Gupta, Latika.

Int J Rheum Dis ; 27(5): e15157, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38720410

RESUMO

Large language models (LLMs) like GPT-4 and Claude are catalyzing transformation across medical research including rheumatology. This review examines their applications, highlighting the pivotal role of prompt engineering in effectively guiding LLMs. Key aspects explored include literature synthesis, data analysis, manuscript drafting, coding assistance, privacy considerations, and generative artificial intelligence integrations. While LLMs accelerate workflows, reliance without apt prompting jeopardizes accuracy. By methodically constructing prompts and gauging model outputs, researchers can maximize relevance and utility. Locally run open-source models also offer data privacy protections. As LLMs permeate rheumatology research, developing expertise in strategic prompting and assessing model limitations is critical. With proper oversight, LLMs markedly boost scholarly productivity.

Assuntos

Pesquisa Biomédica , Reumatologia , Humanos , Inteligência Artificial

12.

Well-Tempered Medical Prompt Engineering for Explainable Extubation.

Lee, Sujung; Cho, Won Ik; Park, Chansung; Ko, Taehoon.

Stud Health Technol Inform ; 316: 587-588, 2024 Aug 22.

Artigo em Inglês | MEDLINE | ID: mdl-39176810

RESUMO

This study investigated whether the large language model (LLM) utilizes sufficient domain knowledge to reason about critical medical events such as extubation. In detail, we tested whether the LLM accurately comprehends given tabular data and variable importance and whether it can be used in complement to existing ML models such as XGBoost.

Assuntos

Extubação , Humanos , Processamento de Linguagem Natural , Sistemas de Apoio a Decisões Clínicas

13.

Beyond Tokens: Fair Evaluation of French Large Language Models for Clinical Named Entity Recognition.

Zaghir, Jamil; Bjelogrlic, Mina; Goldman, Jean-Philippe; Bensahla, Adel; Zheng, Yuanyuan; Lovis, Christian.

Stud Health Technol Inform ; 316: 666-670, 2024 Aug 22.

Artigo em Inglês | MEDLINE | ID: mdl-39176830

RESUMO

Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT's ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.

Assuntos

Processamento de Linguagem Natural , França , Humanos , Registros Eletrônicos de Saúde , Idioma , Neoplasias , Vocabulário Controlado

14.

A Road Map of Prompt Engineering for ChatGPT in Healthcare: A Perspective Study.

Abhari, Shahabeddin; Fatahi, Somayeh; Saragadam, Ashish; Chumachenko, Dmytro; Pelegrini Morita, Plinio.

Stud Health Technol Inform ; 316: 998-1002, 2024 Aug 22.

Artigo em Inglês | MEDLINE | ID: mdl-39176959

RESUMO

Generative AI models, such as ChatGPT, have significantly impacted healthcare through the strategic use of prompts to enhance precision, relevance, and ethical standards. This perspective explores the application of prompt engineering to tailor outputs specifically for healthcare stakeholders: patients, providers, policymakers, and researchers. A nine-stage process for prompt engineering in healthcare is proposed, encompassing identifying applications, understanding stakeholder needs, designing tailored prompts, iterative testing and refinement, ethical considerations, collaborative feedback, documentation, training, and continuous updates. A literature review focused on "Generative AI" or "ChatGPT," prompts, and healthcare informed this study, identifying key prompts through qualitative analysis and expert input. This systematic approach ensures that AI-generated prompts align with stakeholder requirements, offering valuable insights into symptoms, treatments, and prevention, thereby supporting informed decision-making among patients.

Assuntos

Inteligência Artificial , Humanos , Atenção à Saúde

15.

AI as a user of AI: Towards responsible autonomy.

Shukla, Amit K; Terziyan, Vagan; Tiihonen, Timo.

Heliyon ; 10(11): e31397, 2024 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-38947449

RESUMO

Recent advancements in Artificial Intelligence (AI), particularly in generative language models and algorithms, have led to significant impacts across diverse domains. AI capabilities to address prompts are growing beyond human capability but we expect AI to perform well also as a prompt engineer. Additionally, AI can serve as a guardian for ethical, security, and other predefined issues related to generated content. We postulate that enforcing dialogues among AI-as-prompt-engineer, AI-as-prompt-responder, and AI-as-Compliance-Guardian can lead to high-quality and responsible solutions. This paper introduces a novel AI collaboration paradigm emphasizing responsible autonomy, with implications for addressing real-world challenges. The paradigm of responsible AI-AI conversation establishes structured interaction patterns, guaranteeing decision-making autonomy. Key implications include enhanced understanding of AI dialogue flow, compliance with rules and regulations, and decision-making scenarios exemplifying responsible autonomy. Real-world applications envision AI systems autonomously addressing complex challenges. We have made preliminary testing of such a paradigm involving instances of ChatGPT autonomously playing various roles in a set of experimental AI-AI conversations and observed evident added value of such a framework.

16.

Knowledge graph construction for heart failure using large language models with prompt engineering.

Xu, Tianhan; Gu, Yixun; Xue, Mantian; Gu, Renjie; Li, Bin; Gu, Xiang.

Front Comput Neurosci ; 18: 1389475, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39015745

RESUMO

Introduction: Constructing an accurate and comprehensive knowledge graph of specific diseases is critical for practical clinical disease diagnosis and treatment, reasoning and decision support, rehabilitation, and health management. For knowledge graph construction tasks (such as named entity recognition, relation extraction), classical BERT-based methods require a large amount of training data to ensure model performance. However, real-world medical annotation data, especially disease-specific annotation samples, are very limited. In addition, existing models do not perform well in recognizing out-of-distribution entities and relations that are not seen in the training phase. Method: In this study, we present a novel and practical pipeline for constructing a heart failure knowledge graph using large language models and medical expert refinement. We apply prompt engineering to the three phases of schema design: schema design, information extraction, and knowledge completion. The best performance is achieved by designing task-specific prompt templates combined with the TwoStepChat approach. Results: Experiments on two datasets show that the TwoStepChat method outperforms the Vanillia prompt and outperforms the fine-tuned BERT-based baselines. Moreover, our method saves 65% of the time compared to manual annotation and is better suited to extract the out-of-distribution information in the real world.

17.

Improving the Quality of Unstructured Cancer Data Using Large Language Models: A German Oncological Case Study.

Mou, Yongli; Lehmkuhl, Jonathan; Sauerbrunn, Nicolas; Köchel, Anja; Panse, Jens; Truh, Daniel; Sowe, Sulayman; Brümmendorf, Tim; Decker, Stefan.

Stud Health Technol Inform ; 316: 685-689, 2024 Aug 22.

Artigo em Inglês | MEDLINE | ID: mdl-39176835

RESUMO

With cancer being a leading cause of death globally, epidemiological and clinical cancer registration is paramount for enhancing oncological care and facilitating scientific research. However, the heterogeneous landscape of medical data presents significant challenges to the current manual process of tumor documentation. This paper explores the potential of Large Language Models (LLMs) for transforming unstructured medical reports into the structured format mandated by the German Basic Oncology Dataset. Our findings indicate that integrating LLMs into existing hospital data management systems or cancer registries can significantly enhance the quality and completeness of cancer data collection - a vital component for diagnosing and treating cancer and improving the effectiveness and benefits of therapies. This work contributes to the broader discussion on the potential of artificial intelligence or LLMs to revolutionize medical data processing and reporting in general and cancer care in particular.

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Neoplasias , Alemanha , Humanos , Neoplasias/terapia , Sistema de Registros , Inteligência Artificial , Oncologia , Confiabilidade dos Dados

18.

Prompt engineering with a large language model to assist providers in responding to patient inquiries: a real-time implementation in the electronic health record.

Afshar, Majid; Gao, Yanjun; Wills, Graham; Wang, Jason; Churpek, Matthew M; Westenberger, Christa J; Kunstman, David T; Gordon, Joel E; Goswami, Cherodeep; Liao, Frank J; Patterson, Brian.

JAMIA Open ; 7(3): ooae080, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39166170

RESUMO

Background: Large language models (LLMs) can assist providers in drafting responses to patient inquiries. We examined a prompt engineering strategy to draft responses for providers in the electronic health record. The aim was to evaluate the change in usability after prompt engineering. Materials and Methods: A pre-post study over 8 months was conducted across 27 providers. The primary outcome was the provider use of LLM-generated messages from Generative Pre-Trained Transformer 4 (GPT-4) in a mixed-effects model, and the secondary outcome was provider sentiment analysis. Results: Of the 7605 messages generated, 17.5% (n = 1327) were used. There was a reduction in negative sentiment with an odds ratio of 0.43 (95% CI, 0.36-0.52), but message use decreased (P < .01). The addition of nurses after the study period led to an increase in message use to 35.8% (P < .01). Discussion: The improvement in sentiment with prompt engineering suggests better content quality, but the initial decrease in usage highlights the need for integration with human factors design. Conclusion: Future studies should explore strategies for optimizing the integration of LLMs into the provider workflow to maximize both usability and effectiveness.

19.

FNPC-SAM: Uncertainty-Guided False Negative/Positive Control for SAM on Noisy Medical Images.

Yao, Xing; Liu, Han; Hu, Dewei; Lu, Daiwei; Lou, Ange; Li, Hao; Deng, Ruining; Arenas, Gabriel; Oguz, Baris; Schwartz, Nadav; Byram, Brett C; Oguz, Ipek.

Proc SPIE Int Soc Opt Eng ; 129262024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38894708

RESUMO

The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM's performance in medical image segmentation. The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy. We evaluate the method on two ultrasound datasets and show improvement in SAM's performance and robustness to inaccurate prompts, without the necessity for further training or tuning. Moreover, we present the Single-Slice-to-Volume (SS2V) method, enabling 3D pixel-level segmentation using only the bounding box annotation from a single 2D slice. Our results allow efficient use of SAM in even noisy, low-contrast medical images. The source code has been released at: https://github.com/MedICL-VU/FNPC-SAM.

20.

A Study of Biomedical Relation Extraction Using GPT Models.

Zhang, Jeffrey; Wibert, Maxwell; Zhou, Huixue; Peng, Xueqing; Chen, Qingyu; Keloth, Vipina K; Hu, Yan; Zhang, Rui; Xu, Hua; Raja, Kalpana.

AMIA Jt Summits Transl Sci Proc ; 2024: 391-400, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38827097

RESUMO

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa