Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

A framework for inferring and analyzing pharmacotherapy treatment patterns.

Rush, Everett; Ozmen, Ozgur; Kim, Minsu; Ortegon, Erin Rush; Jones, Makoto; Park, Byung H; Pizer, Steven; Trafton, Jodie; Brenner, Lisa A; Ward, Merry; Nebeker, Jonathan R.

BMC Med Inform Decis Mak ; 24(1): 68, 2024 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-38459459

RESUMO

BACKGROUND: To discover pharmacotherapy prescription patterns and their statistical associations with outcomes through a clinical pathway inference framework applied to real-world data. METHODS: We apply machine learning steps in our framework using a 2006 to 2020 cohort of veterans with major depressive disorder (MDD). Outpatient antidepressant pharmacy fills, dispensed inpatient antidepressant medications, emergency department visits, self-harm, and all-cause mortality data were extracted from the Department of Veterans Affairs Corporate Data Warehouse. RESULTS: Our MDD cohort consisted of 252,179 individuals. During the study period there were 98,417 emergency department visits, 1,016 cases of self-harm, and 1,507 deaths from all causes. The top ten prescription patterns accounted for 69.3% of the data for individuals starting antidepressants at the fluoxetine equivalent of 20-39 mg. Additionally, we found associations between outcomes and dosage change. CONCLUSIONS: For 252,179 Veterans who served in Iraq and Afghanistan with subsequent MDD noted in their electronic medical records, we documented and described the major pharmacotherapy prescription patterns implemented by Veterans Health Administration providers. Ten patterns accounted for almost 70% of the data. Associations between antidepressant usage and outcomes in observational data may be confounded. The low numbers of adverse events, especially those associated with all-cause mortality, make our calculations imprecise. Furthermore, our outcomes are also indications for both disease and treatment. Despite these limitations, we demonstrate the usefulness of our framework in providing operational insight into clinical practice, and our results underscore the need for increased monitoring during critical points of treatment.

Assuntos

Transtorno Depressivo Maior , Veteranos , Humanos , Transtorno Depressivo Maior/induzido quimicamente , Transtorno Depressivo Maior/tratamento farmacológico , Antidepressivos/uso terapêutico

2.

ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis.

Gan, Ziming; Zhou, Doudou; Rush, Everett; Panickan, Vidul A; Ho, Yuk-Lam; Ostrouchov, George; Xu, Zhiwei; Shen, Shuting; Xiong, Xin; Greco, Kimberly F; Hong, Chuan; Bonzel, Clara-Lea; Wen, Jun; Costa, Lauren; Cai, Tianrun; Begoli, Edmon; Xia, Zongqi; Gaziano, J Michael; Liao, Katherine P; Cho, Kelly; Cai, Tianxi; Lu, Junwei.

medRxiv ; 2023 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-37293026

RESUMO

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

3.

Multimodal representation learning for predicting molecule-disease relations.

Wen, Jun; Zhang, Xiang; Rush, Everett; Panickan, Vidul A; Li, Xingyu; Cai, Tianrun; Zhou, Doudou; Ho, Yuk-Lam; Costa, Lauren; Begoli, Edmon; Hong, Chuan; Gaziano, J Michael; Cho, Kelly; Lu, Junwei; Liao, Katherine P; Zitnik, Marinka; Cai, Tianxi.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36805623

RESUMO

MOTIVATION: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Desenvolvimento de Medicamentos , Registros Eletrônicos de Saúde , Redes Neurais de Computação , Farmacovigilância

4.

Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.

Zhou, Doudou; Gan, Ziming; Shi, Xu; Patwari, Alina; Rush, Everett; Bonzel, Clara-Lea; Panickan, Vidul A; Hong, Chuan; Ho, Yuk-Lam; Cai, Tianrun; Costa, Lauren; Li, Xiaoou; Castro, Victor M; Murphy, Shawn N; Brat, Gabriel; Weber, Griffin; Avillach, Paul; Gaziano, J Michael; Cho, Kelly; Liao, Katherine P; Lu, Junwei; Cai, Tianxi.

J Biomed Inform ; 133: 104147, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35872266

RESUMO

OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.

Assuntos

COVID-19 , Registros Eletrônicos de Saúde , Algoritmos , Humanos , Logical Observation Identifiers Names and Codes , Reconhecimento Automatizado de Padrão

5.

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.

Hong, Chuan; Rush, Everett; Liu, Molei; Zhou, Doudou; Sun, Jiehuan; Sonabend, Aaron; Castro, Victor M; Schubert, Petra; Panickan, Vidul A; Cai, Tianrun; Costa, Lauren; He, Zeling; Link, Nicholas; Hauser, Ronald; Gaziano, J Michael; Murphy, Shawn N; Ostrouchov, George; Ho, Yuk-Lam; Begoli, Edmon; Lu, Junwei; Cho, Kelly; Liao, Katherine P; Cai, Tianxi.

NPJ Digit Med ; 4(1): 151, 2021 Oct 27.

Artigo em Inglês | MEDLINE | ID: mdl-34707226

RESUMO

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

6.

Standardized Architecture for a Mega-Biobank Phenomic Library: The Million Veteran Program (MVP).

Knight, Kathryn E; Honerlaw, Jacqueline; Danciu, Ioana; Linares, Franciel; Ho, Yuk-Lam; Gagnon, David R; Rush, Everett; Gaziano, J Michael; Begoli, Edmon; Cho, Kelly.

AMIA Jt Summits Transl Sci Proc ; 2020: 326-334, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32477652

RESUMO

Electronic health records (EHRs) provide a wealth of data for phenotype development in population health studies, and researchers invest considerable time to curate data elements and validate disease definitions. The ability to reproduce well-defined phenotypes increases data quality, comparability of results and expedites research. In this paper, we present a standardized approach to organize and capture phenotype definitions, resulting in the creation of an open, online repository of phenotypes. This resource captures phenotype development, provenance and process from the Million Veteran Program, a national mega-biobank embedded in the Veterans Health Administration (VHA). To ensure that the repository is searchable, extendable, and sustainable, it is necessary to develop both a proper digital catalog architecture and underlying metadata infrastructure to enable effective management of the data fields required to define each phenotype. Our methods provide a resource for VHA investigators and a roadmap for researchers interested in standardizing their phenotype definitions to increase portability.

7.

JSONize: A Scalable Machine Learning Pipeline to Model Medical Notes as Semi-structured Documents.

Rush, Everett N; Danciu, Ioana; Ostrouchov, George; Cho, Kelly; Mayer, Benjamin W; Ho, Yuk-Lam; Honerlaw, Jacqueline; Costa, Lauren; Linares, Franciel; Begoli, Edmon.

AMIA Jt Summits Transl Sci Proc ; 2020: 533-541, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32477675

RESUMO

The Department of Veteran's Affairs (VA) archives one of the largest corpora of clinical notes in their corporate data warehouse as unstructured text data. Unstructured text easily supports keyword searches and regular expressions. Often these simple searches do not adequately support the complex searches that need to be performed on notes. For example, a researcher may want all notes with a Duke Treadmill Score of less than five or people that smoke more than one pack per day. Range queries like this and more can be supported by modelling text as semi-structured documents. In this paper, we implement a scalable machine learning pipeline that models plain medical text as useful semi-structured documents. We improve on existing models and achieve an F1-score of 0.912 and scale our methods to the entire VA corpus.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA