Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
AMIA Jt Summits Transl Sci Proc ; 2024: 642-651, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827077

RESUMO

The results of clinical trials are a valuable source of evidence for researchers, policy makers, and healthcare professionals. However, online trial registries do not always contain links to the publications that report on their results, instead requiring a time-consuming manual search. Here, we explored the application of pre-trained transformer-based language models to automatically identify result-reporting publications of cancer clinical trials by computing dense vectors and performing semantic search. Models were fine-tuned on text data from trial registry fields and article metadata using a contrastive learning approach. The best performing model was PubMedBERT, which achieved a mean average precision of 0.592 and ranked 70.3% of a trial's publications in the top 5 results when tested on the holdout test trials. Our results suggest that semantic search using embeddings from transformer models may be an effective approach to the task of linking trials to their publications.

2.
Res Sq ; 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38978609

RESUMO

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different n2c2 datasets. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.

3.
Digit Health ; 10: 20552076241228430, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38357587

RESUMO

Background: Risky health behaviors place an enormous toll on public health systems. While relapse prevention support is integrated with most behavior modification programs, the results are suboptimal. Recent advances in artificial intelligence (AI) applications provide us with unique opportunities to develop just-in-time adaptive behavior change solutions. Methods: In this study, we present an innovative framework, grounded in behavioral theory, and enhanced with social media sequencing and communications scenario builder to architect a conversational agent (CA) specialized in the prevention of relapses in the context of tobacco cessation. We modeled peer interaction data (n = 1000) using the taxonomy of behavior change techniques (BCTs) and speech act (SA) theory to uncover the socio-behavioral and linguistic context embedded within the online social discourse. Further, we uncovered the sequential patterns of BCTs and SAs from social conversations (n = 339,067). We utilized grounded theory-based techniques for extracting the scenarios that best describe individuals' needs and mapped them into the architecture of the virtual CA. Results: The frequently occurring sequential patterns for BCTs were comparison of behavior and feedback and monitoring; for SAs were directive and assertion. Five cravings-related scenarios describing users' needs as they deal with nicotine cravings were identified along with the kinds of behavior change constructs that are being elicited within those scenarios. Conclusions: AI-led virtual CAs focusing on behavior change need to employ data-driven and theory-linked approaches to address issues related to engagement, sustainability, and acceptance. The sequential patterns of theory and intent manifestations need to be considered when developing effective behavior change CAs.

4.
J Allergy Clin Immunol Glob ; 3(2): 100224, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38439946

RESUMO

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center's electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.

5.
J Healthc Inform Res ; 8(2): 313-352, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38681755

RESUMO

Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in this field. The main objective was to assess and analyze the existing literature on clinical IR, focusing on the methods, techniques, and tools employed for effective retrieval and analysis of medical information. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we conducted an extensive search across databases such as Ovid Embase, Ovid Medline, Scopus, ACM Digital Library, IEEE Xplore, and Web of Science, covering publications from January 1, 2010, to January 4, 2023. The rigorous screening process led to the inclusion of 184 papers in our review. Our findings provide a detailed analysis of the clinical IR research landscape, covering aspects like publication trends, data sources, methodologies, evaluation metrics, and applications. The review identifies key research gaps in clinical IR methods such as indexing, ranking, and query expansion, offering insights and opportunities for future studies in clinical IR, thus serving as a guiding framework for upcoming research efforts in this rapidly evolving field. The study also underscores an imperative for innovative research on advanced clinical IR systems capable of fast semantic vector search and adoption of neural IR techniques for effective retrieval of information from unstructured electronic health records (EHRs). Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-024-00159-4.

6.
J Am Med Inform Assoc ; 31(9): 1812-1820, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38281112

RESUMO

IMPORTANCE: The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models' performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets. OBJECTIVES: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. MATERIALS AND METHODS: We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT. RESULTS: Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed. DISCUSSION: The study's findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings. CONCLUSION: While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos
7.
J Am Med Inform Assoc ; 31(9): 1904-1911, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38520725

RESUMO

OBJECTIVES: The rapid expansion of biomedical literature necessitates automated techniques to discern relationships between biomedical concepts from extensive free text. Such techniques facilitate the development of detailed knowledge bases and highlight research deficiencies. The LitCoin Natural Language Processing (NLP) challenge, organized by the National Center for Advancing Translational Science, aims to evaluate such potential and provides a manually annotated corpus for methodology development and benchmarking. MATERIALS AND METHODS: For the named entity recognition (NER) task, we utilized ensemble learning to merge predictions from three domain-specific models, namely BioBERT, PubMedBERT, and BioM-ELECTRA, devised a rule-driven detection method for cell line and taxonomy names and annotated 70 more abstracts as additional corpus. We further finetuned the T0pp model, with 11 billion parameters, to boost the performance on relation extraction and leveraged entites' location information (eg, title, background) to enhance novelty prediction performance in relation extraction (RE). RESULTS: Our pioneering NLP system designed for this challenge secured first place in Phase I-NER and second place in Phase II-relation extraction and novelty prediction, outpacing over 200 teams. We tested OpenAI ChatGPT 3.5 and ChatGPT 4 in a Zero-Shot setting using the same test set, revealing that our finetuned model considerably surpasses these broad-spectrum large language models. DISCUSSION AND CONCLUSION: Our outcomes depict a robust NLP system excelling in NER and RE across various biomedical entities, emphasizing that task-specific models remain superior to generic large ones. Such insights are valuable for endeavors like knowledge graph development and hypothesis formulation in biomedical research.


Assuntos
Processamento de Linguagem Natural , Mineração de Dados/métodos , Aprendizado de Máquina , Humanos
8.
Proc Int World Wide Web Conf ; 2023(Companion): 820-825, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38327770

RESUMO

Model card reports provide a transparent description of machine learning models which includes information about their evaluation, limitations, intended use, etc. Federal health agencies have expressed an interest in model cards report for research studies using machine-learning based AI. Previously, we have developed an ontology model for model card reports to structure and formalize these reports. In this paper, we demonstrate a Java-based library (OWL API, FaCT++) that leverages our ontology to publish computable model card reports. We discuss future directions and other use cases that highlight applicability and feasibility of ontology-driven systems to support FAIR challenges.

9.
AMIA Annu Symp Proc ; 2023: 1304-1313, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222417

RESUMO

Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility are stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yield the highest average performance across all criteria.


Assuntos
Neoplasias , Humanos , Definição da Elegibilidade/métodos , Idioma , Processamento de Linguagem Natural
10.
Nat Med ; 30(4): 942-943, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38561439

Assuntos
Documentação , Idioma
11.
Artigo em Inglês | MEDLINE | ID: mdl-38125584

RESUMO

The Consumer Health Vocabulary has been an important contribution to the health informatics field since its introduction in 2006. Many studies have utilized the vocabulary for various scientific research to bridge the gap between consumers and health experts. Given the flat file format of the Consumer Health Vocabulary dataset, we developed a SKOS-based ontology of the dataset. As an ontology, this dataset can be semantically linked to other resources to provide consumer-level meaning. In addition with this artifact, we plan to further expand the terminology.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa