Búsqueda | Portal de Búsqueda de la BVS España

Towards cross-application model-agnostic federated cohort discovery.

Dobbins, Nicholas J; Morris, Michele; Sadhu, Eugene; MacFadden, Douglas; Nazaire, Marc-Danie; Simons, William; Weber, Griffin; Murphy, Shawn; Visweswaran, Shyam.

J Am Med Inform Assoc ; 31(10): 2202-2209, 2024 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-39110920

RESUMEN

OBJECTIVES: To demonstrate that 2 popular cohort discovery tools, Leaf and the Shared Health Research Information Network (SHRINE), are readily interoperable. Specifically, we adapted Leaf to interoperate and function as a node in a federated data network that uses SHRINE and dynamically generate queries for heterogeneous data models. MATERIALS AND METHODS: SHRINE queries are designed to run on the Informatics for Integrating Biology & the Bedside (i2b2) data model. We created functionality in Leaf to interoperate with a SHRINE data network and dynamically translate SHRINE queries to other data models. We randomly selected 500 past queries from the SHRINE-based national Evolve to Next-Gen Accrual to Clinical Trials (ENACT) network for evaluation, and an additional 100 queries to refine and debug Leaf's translation functionality. We created a script for Leaf to convert the terms in the SHRINE queries into equivalent structured query language (SQL) concepts, which were then executed on 2 other data models. RESULTS AND DISCUSSION: 91.1% of the generated queries for non-i2b2 models returned counts within 5% (or ±5 patients for counts under 100) of i2b2, with 91.3% recall. Of the 8.9% of queries that exceeded the 5% margin, 77 of 89 (86.5%) were due to errors introduced by the Python script or the extract-transform-load process, which are easily fixed in a production deployment. The remaining errors were due to Leaf's translation function, which was later fixed. CONCLUSION: Our results support that cohort discovery applications such as Leaf and SHRINE can interoperate in federated data networks with heterogeneous data models.

Asunto(s)

Programas Informáticos , Humanos , Estudios de Cohortes , Interoperabilidad de la Información en Salud , Investigación Biomédica , Almacenamiento y Recuperación de la Información/métodos

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record.

Lybarger, Kevin; Dobbins, Nicholas J; Long, Ritche; Singh, Angad; Wedgeworth, Patrick; Uzuner, Özlem; Yetisgen, Meliha.

J Am Med Inform Assoc ; 30(8): 1389-1397, 2023 07 19.

Artículo en Inglés | MEDLINE | ID: mdl-37130345

RESUMEN

OBJECTIVE: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: (1) develop a natural language processing information extraction model to capture detailed SDOH information and (2) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. MATERIALS AND METHODS: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225 089 patients and 430 406 notes with social history sections and compared the extracted SDOH information with existing structured data. RESULTS: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. CONCLUSIONS: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.

Asunto(s)

Registros Electrónicos de Salud , Determinantes Sociales de la Salud , Humanos , Procesamiento de Lenguaje Natural , Factores de Riesgo , Almacenamiento y Recuperación de la Información

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Dobbins, Nicholas J; Han, Bin; Zhou, Weipeng; Lan, Kristine F; Kim, H Nina; Harrington, Robert; Uzuner, Özlem; Yetisgen, Meliha.

J Am Med Inform Assoc ; 30(12): 1954-1964, 2023 11 17.

Artículo en Inglés | MEDLINE | ID: mdl-37550244

RESUMEN

OBJECTIVE: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. MATERIALS AND METHODS: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. RESULTS: LeafAI matched a mean 43% of enrolled patients with 27â225 eligible across 8 clinical trials, compared to 27% matched and 14â587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. CONCLUSIONS: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.

Asunto(s)

Procesamiento de Lenguaje Natural , Unified Medical Language System , Humanos , Bases del Conocimiento , Ensayos Clínicos como Asunto

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria.

Dobbins, Nicholas J; Mullen, Tony; Uzuner, Özlem; Yetisgen, Meliha.

Sci Data ; 9(1): 490, 2022 08 11.

Artículo en Inglés | MEDLINE | ID: mdl-35953524

RESUMEN

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

Asunto(s)

Ensayos Clínicos como Asunto , Procesamiento de Lenguaje Natural , Selección de Paciente , Ensayos Clínicos como Asunto/normas , Bases de Datos Factuales , Humanos , Almacenamiento y Recuperación de la Información

Transferability of neural network clinical deidentification systems.

Lee, Kahyun; Dobbins, Nicholas J; McInnes, Bridget; Yetisgen, Meliha; Uzuner, Özlem.

J Am Med Inform Assoc ; 28(12): 2661-2669, 2021 11 25.

Artículo en Inglés | MEDLINE | ID: mdl-34586386

RESUMEN

OBJECTIVE: Neural network deidentification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start deidentification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical deidentification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer. MATERIALS AND METHODS: We conducted a comparative study of the transferability of NeuroNER using 4 clinical note corpora with multiple note types from 2 institutions. We modified NeuroNER architecturally to integrate 2 types of domain generalization approaches. We evaluated each architecture using 3 training strategies. We measured transferability from external sources; transferability across note types; the contribution of external source data when in-domain training data are available; and transferability across institutions. RESULTS AND CONCLUSIONS: Transferability from a single external source gave inconsistent results. Using additional external sources consistently yielded an F1-score of approximately 80%. Fine-tuning emerged as a dominant transfer strategy, with or without domain generalization. We also found that external sources were useful even in cases where in-domain training data were available. Transferability across institutions differed by note type and annotation label but resulted in improved performance.

Asunto(s)

Anonimización de la Información , Redes Neurales de la Computación , Humanos

Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research.

Dobbins, Nicholas J; Spital, Clifford H; Black, Robert A; Morrison, Jason M; de Veer, Bas; Zampino, Elizabeth; Harrington, Robert D; Britt, Bethene D; Stephens, Kari A; Wilcox, Adam B; Tarczy-Hornoch, Peter; Mooney, Sean D.

J Am Med Inform Assoc ; 27(1): 109-118, 2020 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31592524

RESUMEN

OBJECTIVE: Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources. MATERIALS AND METHODS: Leaf utilizes a flexible biomedical concept system to define hierarchical concepts and ontologies. Each Leaf concept contains both textual representations and SQL query building blocks, exposed by a simple drag-and-drop user interface. Leaf generates abstract syntax trees which are compiled into dynamic SQL queries. RESULTS: Leaf is a successful production-supported tool at the University of Washington, which hosts a central Leaf instance querying an enterprise data warehouse with over 300 active users. Through the support of UW Medicine (https://uwmedicine.org), the Institute of Translational Health Sciences (https://www.iths.org), and the National Center for Data to Health (https://ctsa.ncats.nih.gov/cd2h/), Leaf source code has been released into the public domain at https://github.com/uwrit/leaf. DISCUSSION: Leaf allows the querying of single or multiple clinical databases simultaneously, even those of different data models. This enables fast installation without costly extraction or duplication. CONCLUSIONS: Leaf differs from existing cohort discovery tools because it does not specify a required data model and is designed to seamlessly leverage existing user authentication systems and clinical databases in situ. We believe Leaf to be useful for health system analytics, clinical research data warehouses, precision medicine biobanks, and clinical studies involving large patient cohorts.

Asunto(s)

Data Warehousing , Almacenamiento y Recuperación de la Información/métodos , Investigación Biomédica Traslacional , Interfaz Usuario-Computador , Vocabulario Controlado , Bases de Datos como Asunto , Humanos , Internet , Unified Medical Language System

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA