Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Centralized Interactive Phenomics Resource: an integrated online phenomics knowledgebase for health data users.

Honerlaw, Jacqueline; Ho, Yuk-Lam; Fontin, Francesca; Murray, Michael; Galloway, Ashley; Heise, David; Connatser, Keith; Davies, Laura; Gosian, Jeffrey; Maripuri, Monika; Russo, John; Sangar, Rahul; Tanukonda, Vidisha; Zielinski, Edward; Dubreuil, Maureen; Zimolzak, Andrew J; Panickan, Vidul A; Cheng, Su-Chun; Whitbourne, Stacey B; Gagnon, David R; Cai, Tianxi; Liao, Katherine P; Ramoni, Rachel B; Gaziano, J Michael; Muralidhar, Sumitra; Cho, Kelly.

J Am Med Inform Assoc ; 31(5): 1126-1134, 2024 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-38481028

RESUMO

OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.

Assuntos

Registros Eletrônicos de Saúde , Fenômica , Fenótipo , Bases de Conhecimento , Algoritmos

2.

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records.

Wen, Jun; Hou, Jue; Bonzel, Clara-Lea; Zhao, Yihan; Castro, Victor M; Gainer, Vivian S; Weisenfeld, Dana; Cai, Tianrun; Ho, Yuk-Lam; Panickan, Vidul A; Costa, Lauren; Hong, Chuan; Gaziano, J Michael; Liao, Katherine P; Lu, Junwei; Cho, Kelly; Cai, Tianxi.

Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-38264714

RESUMO

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

3.

ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis.

Gan, Ziming; Zhou, Doudou; Rush, Everett; Panickan, Vidul A; Ho, Yuk-Lam; Ostrouchov, George; Xu, Zhiwei; Shen, Shuting; Xiong, Xin; Greco, Kimberly F; Hong, Chuan; Bonzel, Clara-Lea; Wen, Jun; Costa, Lauren; Cai, Tianrun; Begoli, Edmon; Xia, Zongqi; Gaziano, J Michael; Liao, Katherine P; Cho, Kelly; Cai, Tianxi; Lu, Junwei.

medRxiv ; 2023 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-37293026

RESUMO

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

4.

Multimodal representation learning for predicting molecule-disease relations.

Wen, Jun; Zhang, Xiang; Rush, Everett; Panickan, Vidul A; Li, Xingyu; Cai, Tianrun; Zhou, Doudou; Ho, Yuk-Lam; Costa, Lauren; Begoli, Edmon; Hong, Chuan; Gaziano, J Michael; Cho, Kelly; Lu, Junwei; Liao, Katherine P; Zitnik, Marinka; Cai, Tianxi.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36805623

RESUMO

MOTIVATION: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Desenvolvimento de Medicamentos , Registros Eletrônicos de Saúde , Redes Neurais de Computação , Farmacovigilância

5.

Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.

Zhou, Doudou; Gan, Ziming; Shi, Xu; Patwari, Alina; Rush, Everett; Bonzel, Clara-Lea; Panickan, Vidul A; Hong, Chuan; Ho, Yuk-Lam; Cai, Tianrun; Costa, Lauren; Li, Xiaoou; Castro, Victor M; Murphy, Shawn N; Brat, Gabriel; Weber, Griffin; Avillach, Paul; Gaziano, J Michael; Cho, Kelly; Liao, Katherine P; Lu, Junwei; Cai, Tianxi.

J Biomed Inform ; 133: 104147, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35872266

RESUMO

OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.

Assuntos

COVID-19 , Registros Eletrônicos de Saúde , Algoritmos , Humanos , Logical Observation Identifiers Names and Codes , Reconhecimento Automatizado de Padrão

6.

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.

Hong, Chuan; Rush, Everett; Liu, Molei; Zhou, Doudou; Sun, Jiehuan; Sonabend, Aaron; Castro, Victor M; Schubert, Petra; Panickan, Vidul A; Cai, Tianrun; Costa, Lauren; He, Zeling; Link, Nicholas; Hauser, Ronald; Gaziano, J Michael; Murphy, Shawn N; Ostrouchov, George; Ho, Yuk-Lam; Begoli, Edmon; Lu, Junwei; Cho, Kelly; Liao, Katherine P; Cai, Tianxi.

NPJ Digit Med ; 4(1): 151, 2021 Oct 27.

Artigo em Inglês | MEDLINE | ID: mdl-34707226

RESUMO

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

7.

A high-throughput phenotyping algorithm is portable from adult to pediatric populations.

Geva, Alon; Liu, Molei; Panickan, Vidul A; Avillach, Paul; Cai, Tianxi; Mandl, Kenneth D.

J Am Med Inform Assoc ; 28(6): 1265-1269, 2021 06 12.

Artigo em Inglês | MEDLINE | ID: mdl-33594412

RESUMO

OBJECTIVE: Multimodal automated phenotyping (MAP) is a scalable, high-throughput phenotyping method, developed using electronic health record (EHR) data from an adult population. We tested transportability of MAP to a pediatric population. MATERIALS AND METHODS: Without additional feature engineering or supervised training, we applied MAP to a pediatric population enrolled in a biobank and evaluated performance against physician-reviewed medical records. We also compared performance of MAP at the pediatric institution and the original adult institution where MAP was developed, including for 6 phenotypes validated at both institutions against physician-reviewed medical records. RESULTS: MAP performed equally well in the pediatric setting (average AUC 0.98) as it did at the general adult hospital system (average AUC 0.96). MAP's performance in the pediatric sample was similar across the 6 specific phenotypes also validated against gold-standard labels in the adult biobank. CONCLUSIONS: MAP is highly transportable across diverse populations and has potential for wide-scale use.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Humanos , Fenótipo

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA