Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
Sci Rep ; 14(1): 8021, 2024 04 05.
Article in English | MEDLINE | ID: mdl-38580710

ABSTRACT

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/drug therapy , Diabetes Mellitus, Type 2/genetics , Genetic Association Studies , Phenotype , Polymorphism, Single Nucleotide , Receptors, Interleukin-6/genetics
2.
JAMA Netw Open ; 7(2): e240132, 2024 Feb 05.
Article in English | MEDLINE | ID: mdl-38386322

ABSTRACT

Importance: Buprenorphine significantly reduces opioid-related overdose mortality. From 2002 to 2022, the Drug Addiction Treatment Act of 2000 (DATA 2000) required qualified practitioners to receive a waiver from the Drug Enforcement Agency to prescribe buprenorphine for treatment of opioid use disorder. During this period, waiver uptake among practitioners was modest; subsequent changes need to be examined. Objective: To determine whether the Communities That HEAL (CTH) intervention increased the rate of practitioners with DATA 2000 waivers and buprenorphine prescribing. Design, Setting, and Participants: This prespecified secondary analysis of the HEALing Communities Study, a multisite, 2-arm, parallel, community-level, cluster randomized, open, wait-list-controlled comparison clinical trial was designed to assess the effectiveness of the CTH intervention and was conducted between January 1, 2020, to December 31, 2023, in 67 communities in Kentucky, Massachusetts, New York, and Ohio, accounting for approximately 8.2 million adults. The participants in this trial were communities consisting of counties (n = 48) and municipalities (n = 19). Trial arm randomization was conducted using a covariate constrained randomization procedure stratified by state. Each state was balanced by community characteristics including urban/rural classification, fatal opioid overdose rate, and community population. Thirty-four communities were randomized to the intervention and 33 to wait-list control arms. Data analysis was conducted between March 20 and September 29, 2023, with a focus on the comparison period from July 1, 2021, to June 30, 2022. Intervention: Waiver trainings and other educational trainings were offered or supported by the HEALing Communities Study research sites in each state to help build practitioner capacity. Main Outcomes and Measures: The rate of practitioners with a DATA 2000 waiver (overall, and stratified by 30-, 100-, and 275-patient limits) per 100 000 adult residents aged 18 years or older during July 1, 2021, to June 30, 2022, were compared between the intervention and wait-list control communities. The rate of buprenorphine prescribing among those waivered practitioners was also compared between the intervention and wait-list control communities. Intention-to-treat and per-protocol analyses were performed. Results: A total of 8 166 963 individuals aged 18 years or older were residents of the 67 communities studied. There was no evidence of an effect of the CTH intervention on the adjusted rate of practitioners with a DATA 2000 waiver (adjusted relative rate [ARR], 1.04; 95% CI, 0.94-1.14) or the adjusted rate of practitioners with a DATA 2000 waiver who actively prescribed buprenorphine (ARR, 0.97; 95% CI, 0.86-1.10). Conclusions and Relevance: In this randomized clinical trial, the CTH intervention was not associated with increases in the rate of practitioners with a DATA 2000 waiver or buprenorphine prescribing among those waivered practitioners. Supporting practitioners to prescribe buprenorphine remains a critical yet challenging step in the continuum of care to treat opioid use disorder. Trial Registration: ClinicalTrials.gov Identifier: NCT04111939.


Subject(s)
Buprenorphine , Opiate Overdose , Opioid-Related Disorders , Adult , Humans , Buprenorphine/therapeutic use , Data Analysis , Educational Status , Intention , Opioid-Related Disorders/drug therapy , Adolescent , Multicenter Studies as Topic , Randomized Controlled Trials as Topic
3.
Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38264714

ABSTRACT

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

4.
Sci Rep ; 14(1): 1793, 2024 01 20.
Article in English | MEDLINE | ID: mdl-38245528

ABSTRACT

We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.


Subject(s)
Carcinoma, Renal Cell , Kidney Neoplasms , Veterans , Humans , Veterans/psychology , Retrospective Studies , Cross-Sectional Studies , Prospective Studies , Suicide, Attempted , Machine Learning
5.
medRxiv ; 2023 Oct 02.
Article in English | MEDLINE | ID: mdl-37873131

ABSTRACT

Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.

6.
Article in English | MEDLINE | ID: mdl-37396195

ABSTRACT

[This corrects the article DOI: 10.1017/ash.2023.136.].

7.
medRxiv ; 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37425708

ABSTRACT

Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P<4.6×10-11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.

8.
medRxiv ; 2023 May 21.
Article in English | MEDLINE | ID: mdl-37293026

ABSTRACT

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

9.
Article in English | MEDLINE | ID: mdl-37179767

ABSTRACT

Objective: Data are scarce regarding hospital infection control committees and compliance with infection prevention and control (IPC) recommendations in Brazil, a country of continental dimensions. We assessed the main characteristics of infection control committees (ICCs) on healthcare-associated infections (HAIs) in Brazilian hospitals. Methods: This cross-sectional study was conducted in ICCs of public and private hospitals distributed across all Brazilian regions. Data were collected directly from the ICC staff by completing an online questionnaire and during on-site visits through face-to-face interviews. Results: In total, 53 Brazilian hospitals were evaluated from October 2019 to December 2020. All hospitals had implemented the IPC core components in their programs. All centers had protocols for the prevention and control of ventilator-associated pneumonia as well as bloodstream, surgical site, and catheter-associated urinary tract infections. Most hospitals (80%) had no budget specifically allocated to the IPC program; 34% of the laundry staff had received specific IPC training; and only 7.5% of hospitals reported occupational infections in healthcare workers. Conclusions: In this sample, most ICCs complied with the minimum requirements for IPC programs. The main limitation regarding ICCs was the lack of financial support. The findings of this survey support the development of strategic plans to improve IPCs in Brazilian hospitals.

10.
Bioinformatics ; 39(2)2023 02 03.
Article in English | MEDLINE | ID: mdl-36805623

ABSTRACT

MOTIVATION: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Humans , Drug Development , Electronic Health Records , Neural Networks, Computer , Pharmacovigilance
11.
Intensive Care Med ; 49(2): 166-177, 2023 02.
Article in English | MEDLINE | ID: mdl-36594987

ABSTRACT

PURPOSE: To assess the association between acute disease severity and 1-year quality of life in patients discharged after hospitalisation due to coronavirus disease 2019 (COVID-19). METHODS: We conducted a prospective cohort study nested in 5 randomised clinical trials between March 2020 and March 2022 at 84 sites in Brazil. Adult post-hospitalisation COVID-19 patients were followed for 1 year. The primary outcome was the utility score of EuroQol five-dimension three-level (EQ-5D-3L). Secondary outcomes included all-cause mortality, major cardiovascular events, and new disabilities in instrumental activities of daily living. Adjusted generalised estimating equations were used to assess the association between outcomes and acute disease severity according to the highest level on a modified ordinal scale during hospital stay (2: no oxygen therapy; 3: oxygen by mask or nasal prongs; 4: high-flow nasal cannula oxygen therapy or non-invasive ventilation; 5: mechanical ventilation). RESULTS: 1508 COVID-19 survivors were enrolled. Primary outcome data were available for 1156 participants. At 1 year, compared with severity score 2, severity score 5 was associated with lower EQ-5D-3L utility scores (0.7 vs 0.84; adjusted difference, - 0.1 [95% CI - 0.15 to - 0.06]); and worse results for all-cause mortality (7.9% vs 1.2%; adjusted difference, 7.1% [95% CI 2.5%-11.8%]), major cardiovascular events (5.6% vs 2.3%; adjusted difference, 2.6% [95% CI 0.6%-4.6%]), and new disabilities (40.4% vs 23.5%; adjusted difference, 15.5% [95% CI 8.5%-22.5]). Severity scores 3 and 4 did not differ consistently from score 2. CONCLUSIONS: COVID-19 patients who needed mechanical ventilation during hospitalisation have lower 1-year quality of life than COVID-19 patients who did not need mechanical ventilation during hospitalisation.


Subject(s)
COVID-19 , Cardiovascular Diseases , Adult , Humans , SARS-CoV-2 , Quality of Life , Activities of Daily Living , Prospective Studies , Respiration, Artificial , Hospitalization , Patient Acuity
12.
Sci Rep ; 12(1): 14914, 2022 Sep 01.
Article in English | MEDLINE | ID: mdl-36050444

ABSTRACT

Understanding the genetic relationships between human disorders could lead to better treatment and prevention strategies, especially for individuals with multiple comorbidities. A common resource for studying genetic-disease relationships is the GWAS Catalog, a large and well curated repository of SNP-trait associations from various studies and populations. Some of these populations are contained within mega-biobanks such as the Million Veteran Program (MVP), which has enabled the genetic classification of several diseases in a large well-characterized and heterogeneous population. Here we aim to provide a network of the genetic relationships among diseases and to demonstrate the utility of quantifying the extent to which a given resource such as MVP has contributed to the discovery of such relations. We use a network-based approach to evaluate shared variants among thousands of traits in the GWAS Catalog repository. Our results indicate many more novel disease relationships that did not exist in early studies and demonstrate that the network can reveal clusters of diseases mechanistically related. Finally, we show novel disease connections that emerge when MVP data is included, highlighting methodology that can be used to indicate the contributions of a given biobank.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Biological Specimen Banks , Comorbidity , Computer Simulation , Genome-Wide Association Study/methods , Humans , Phenotype
13.
J Biomed Inform ; 133: 104147, 2022 09.
Article in English | MEDLINE | ID: mdl-35872266

ABSTRACT

OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.


Subject(s)
COVID-19 , Electronic Health Records , Algorithms , Humans , Logical Observation Identifiers Names and Codes , Pattern Recognition, Automated
14.
Sci Rep ; 12(1): 12018, 2022 07 14.
Article in English | MEDLINE | ID: mdl-35835798

ABSTRACT

A better understanding of the sequential and temporal aspects in which diseases occur in patient's lives is essential for developing improved intervention strategies that reduce burden and increase the quality of health services. Here we present a network-based framework to study disease relationships using Electronic Health Records from > 9 million patients in the United States Veterans Health Administration (VHA) system. We create the Temporal Disease Network, which maps the sequential aspects of disease co-occurrence among patients and demonstrate that network properties reflect clinical aspects of the respective diseases. We use the Temporal Disease Network to identify disease groups that reflect patterns of disease co-occurrence and the flow of patients among diagnoses. Finally, we define a strategy for the identification of trajectories that lead from one disease to another. The framework presented here has the potential to offer new insights for disease treatment and prevention in large health care systems.


Subject(s)
Veterans , Delivery of Health Care , Electronic Health Records , Humans , United States/epidemiology , United States Department of Veterans Affairs
15.
Neurology ; 2022 Jun 01.
Article in English | MEDLINE | ID: mdl-35649728

ABSTRACT

BACKGROUND AND OBJECTIVES: Racial and ethnic disparities in stroke outcomes exist, however differences by stroke type are less understood. We studied the association of race and ethnicity with stroke mortality, by stroke type, in a national sample of hospitalized patients in the Veterans Health Administration. METHODS: A retrospective observational study was performed including non-Hispanic White, non-Hispanic Black, and Hispanic patients with a first hospitalization for stroke between 2002 and 2012. Stroke was determined using International Classification of Diseases-Ninth Revision codes, and date of death was obtained from the National Death Index. For each of acute ischemic stroke (AIS), intracerebral hemorrhage (ICH), and subarachnoid hemorrhage (SAH), we constructed a piecewise multivariable model for all-cause mortality, using follow-up intervals of ≤30 days, 31-90 days, 91 days-1 year, and >1 year. RESULTS: Among 37,790 stroke patients (89% AIS, 9% ICH, 2% SAH), 25,492 (67%) were non-Hispanic White, 9,752 (26%) were non-Hispanic Black, and 2,546 (7%) were Hispanic. The cohort was predominantly male (98%). Compared to White patients, Black patients experienced better 30-day survival after AIS (HR=0.80, 95% CI 0.73-0.88; 1.4% risk difference) and worse 30-day survival after ICH (HR=1.24, 95% CI 1.06-1.44; 3.2% risk difference). Hispanic patients experienced reduced risk for >1-year mortality after AIS (HR=0.87, 95% CI 0.80-0.94), but had greater risk of 30-day mortality after SAH compared to White patients (HR=1.61, 95% CI 1.03-2.52; 10.3% risk difference). DISCUSSION: In our study, absolute risk of 30-day mortality after ICH was 3.2% higher for Black patients and after SAH was 10.3% higher for Hispanic patients, compared to White patients. These findings underscore the importance of investigating stroke outcomes by stroke type, to better understand the factors driving observed racial and ethnic disparities.

16.
Int J Med Inform ; 162: 104753, 2022 Apr 01.
Article in English | MEDLINE | ID: mdl-35405530

ABSTRACT

OBJECTIVE: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS: CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION: CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

17.
PLoS Genet ; 18(4): e1010113, 2022 04.
Article in English | MEDLINE | ID: mdl-35482673

ABSTRACT

The study aims to determine the shared genetic architecture between COVID-19 severity with existing medical conditions using electronic health record (EHR) data. We conducted a Phenome-Wide Association Study (PheWAS) of genetic variants associated with critical illness (n = 35) or hospitalization (n = 42) due to severe COVID-19 using genome-wide association summary data from the Host Genetics Initiative. PheWAS analysis was performed using genotype-phenotype data from the Veterans Affairs Million Veteran Program (MVP). Phenotypes were defined by International Classification of Diseases (ICD) codes mapped to clinically relevant groups using published PheWAS methods. Among 658,582 Veterans, variants associated with severe COVID-19 were tested for association across 1,559 phenotypes. Variants at the ABO locus (rs495828, rs505922) associated with the largest number of phenotypes (nrs495828 = 53 and nrs505922 = 59); strongest association with venous embolism, odds ratio (ORrs495828 1.33 (p = 1.32 x 10-199), and thrombosis ORrs505922 1.33, p = 2.2 x10-265. Among 67 respiratory conditions tested, 11 had significant associations including MUC5B locus (rs35705950) with increased risk of idiopathic fibrosing alveolitis OR 2.83, p = 4.12 × 10-191; CRHR1 (rs61667602) associated with reduced risk of pulmonary fibrosis, OR 0.84, p = 2.26× 10-12. The TYK2 locus (rs11085727) associated with reduced risk for autoimmune conditions, e.g., psoriasis OR 0.88, p = 6.48 x10-23, lupus OR 0.84, p = 3.97 x 10-06. PheWAS stratified by ancestry demonstrated differences in genotype-phenotype associations. LMNA (rs581342) associated with neutropenia OR 1.29 p = 4.1 x 10-13 among Veterans of African and Hispanic ancestry but not European. Overall, we observed a shared genetic architecture between COVID-19 severity and conditions related to underlying risk factors for severe and poor COVID-19 outcomes. Differing associations between genotype-phenotype across ancestries may inform heterogenous outcomes observed with COVID-19. Divergent associations between risk for severe COVID-19 with autoimmune inflammatory conditions both respiratory and non-respiratory highlights the shared pathways and fine balance of immune host response and autoimmunity and caution required when considering treatment targets.


Subject(s)
COVID-19 , Veterans , COVID-19/epidemiology , COVID-19/genetics , Genetic Association Studies , Genome-Wide Association Study/methods , Humans , Polymorphism, Single Nucleotide/genetics
18.
Stroke ; 53(3): 886-894, 2022 03.
Article in English | MEDLINE | ID: mdl-34727740

ABSTRACT

BACKGROUND AND PURPOSE: Low blood pressure (BP) is associated with higher stroke mortality, although the factors underlying this association have not been fully explored. We investigated prestroke BP and long-term mortality after ischemic stroke in a national sample of US veterans. METHODS: Using a retrospective cohort study design of veterans hospitalized between 2002 and 2007 with a first ischemic stroke and with ≥1 outpatient BP measurements 1 to 18 months before admission, we defined 6 categories each of average prestroke systolic BP (SBP) and diastolic BP, and 7 categories of pulse pressure. Patients were followed-up to 12 years for primary outcomes of all-cause and cardiovascular mortality. We used Cox models to relate prestroke BP indices to mortality and stratified analyses by the presence of preexisting comorbidities (smoking, myocardial infarction, heart failure, atrial fibrillation/flutter, cancer, and dementia), race and ethnicity. RESULTS: Of 29 690 eligible veterans with stroke (mean±SD age 67±12 years, 98% men, 67% White), 2989 (10%) had average prestroke SBP<120 mm Hg. During a follow-up of 4.1±3.3 years, patients with SBP<120 mm Hg experienced 61% all-cause and 27% cardiovascular mortality. In multivariable analyses, patients with the lowest SBP, lowest diastolic BP, and highest pulse pressure had the highest mortality risk: SBP<120 versus 130 to 139 mm Hg (hazard ratio=1.26 [95% CI, 1.19-1.34]); diastolic BP <60 versus 70 to 79 mm Hg (hazard ratio=1.35 [95% CI, 1.23-1.49]); and pulse pressure ≥90 versus 60 to 69 mm Hg (hazard ratio=1.24 [95% CI, 1.15-1.35]). Patients with average SBP<120 mm Hg and at least one comorbidity (smoking, heart disease, cancer, or dementia) had the highest mortality risk (hazard ratio=1.45 [95% CI, 1.37-1.53]). CONCLUSIONS: Compared with normotension, low prestroke BP was associated with mortality after stroke, particularly among patients with at least one comorbidity.


Subject(s)
Hypotension , Ischemic Stroke , Veterans , Aged , Comorbidity , Female , Humans , Hypotension/mortality , Hypotension/physiopathology , Ischemic Stroke/mortality , Ischemic Stroke/physiopathology , Male , Middle Aged , Retrospective Studies , United States
19.
medRxiv ; 2021 Oct 15.
Article in English | MEDLINE | ID: mdl-34642702

ABSTRACT

The study aims to determine the shared genetic architecture between COVID-19 severity with existing medical conditions using electronic health record (EHR) data. We conducted a Phenome-Wide Association Study (PheWAS) of genetic variants associated with critical illness (n=35) or hospitalization (n=42) due to severe COVID-19 using genome-wide association summary from the Host Genetics Initiative. PheWAS analysis was performed using genotype-phenotype data from the Veterans Affairs Million Veteran Program (MVP). Phenotypes were defined by International Classification of Diseases (ICD) codes mapped to clinically relevant groups using published PheWAS methods. Among 658,582 Veterans, variants associated with severe COVID-19 were tested for association across 1,559 phenotypes. Variants at the ABO locus (rs495828, rs505922) associated with the largest number of phenotypes (nrs495828=53 and nrs505922=59); strongest association with venous embolism, odds ratio (ORrs495828 1.33 (p=1.32 × 10-199), and thrombosis ORrs505922 1.33, p=2.2 × 10-265. Among 67 respiratory conditions tested, 11 had significant associations including MUC5B locus (rs35705950) with increased risk of idiopathic fibrosing alveolitis OR 2.83, p=4.12 × 10-191; CRHR1 (rs61667602) associated with reduced risk of pulmonary fibrosis, OR 0.84, p=2.26 × 10-12. The TYK2 locus (rs11085727) associated with reduced risk for autoimmune conditions, e.g., psoriasis OR 0.88, p=6.48 × 10-23, lupus OR 0.84, p=3.97 × 10-06. PheWAS stratified by genetic ancestry demonstrated differences in genotype-phenotype associations across ancestry. LMNA (rs581342) associated with neutropenia OR 1.29 p=4.1 × 10-13 among Veterans of African ancestry but not European. Overall, we observed a shared genetic architecture between COVID-19 severity and conditions related to underlying risk factors for severe and poor COVID-19 outcomes. Differing associations between genotype-phenotype across ancestries may inform heterogenous outcomes observed with COVID-19. Divergent associations between risk for severe COVID-19 with autoimmune inflammatory conditions both respiratory and non-respiratory highlights the shared pathways and fine balance of immune host response and autoimmunity and caution required when considering treatment targets.

20.
NPJ Digit Med ; 4(1): 151, 2021 Oct 27.
Article in English | MEDLINE | ID: mdl-34707226

ABSTRACT

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

SELECTION OF CITATIONS
SEARCH DETAIL
...