Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 137
Filter
Add more filters

Publication year range
1.
Nature ; 600(7890): 675-679, 2021 12.
Article in English | MEDLINE | ID: mdl-34887591

ABSTRACT

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.


Subject(s)
Cardiovascular Diseases , Genome-Wide Association Study , Cardiovascular Diseases/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Humans , Linkage Disequilibrium , Multifactorial Inheritance , Polymorphism, Single Nucleotide/genetics , Population Groups
2.
Am J Hum Genet ; 109(8): 1366-1387, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35931049

ABSTRACT

A major challenge of genome-wide association studies (GWASs) is to translate phenotypic associations into biological insights. Here, we integrate a large GWAS on blood lipids involving 1.6 million individuals from five ancestries with a wide array of functional genomic datasets to discover regulatory mechanisms underlying lipid associations. We first prioritize lipid-associated genes with expression quantitative trait locus (eQTL) colocalizations and then add chromatin interaction data to narrow the search for functional genes. Polygenic enrichment analysis across 697 annotations from a host of tissues and cell types confirms the central role of the liver in lipid levels and highlights the selective enrichment of adipose-specific chromatin marks in high-density lipoprotein cholesterol and triglycerides. Overlapping transcription factor (TF) binding sites with lipid-associated loci identifies TFs relevant in lipid biology. In addition, we present an integrative framework to prioritize causal variants at GWAS loci, producing a comprehensive list of candidate causal genes and variants with multiple layers of functional evidence. We highlight two of the prioritized genes, CREBRF and RRBP1, which show convergent evidence across functional datasets supporting their roles in lipid biology.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Chromatin/genetics , Genomics , Humans , Lipids/genetics , Polymorphism, Single Nucleotide/genetics
3.
Allergy ; 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39092539

ABSTRACT

BACKGROUND: Recently, we have identified a dysregulated protein signature in the esophageal epithelium of eosinophilic esophagitis (EoE) patients including proteins associated with inflammation and epithelial barrier function; however, the effect of proton pump inhibitor (PPI) treatment on this signature is unknown. Herein, we used a proteomic approach to investigate: (1) whether PPI treatment alters the esophageal epithelium protein profile observed in EoE patients and (2) whether the protein signature at baseline predicts PPI response. METHODS: We evaluated the protein signature of esophageal biopsies using a cohort of adult EoE (n = 25) patients and healthy controls (C) (n = 10). In EoE patients, esophageal biopsies were taken before (pre) and after (post) an 8-week PPI treatment, determining the histologic response. Eosinophil count PostPPI was used to classify the patients: ≥15 eosinophils/hpf as non-responders (non-responder) and < 15 eosinophils/hpf as responders (R). Protein signature was determined and differentially accumulated proteins were characterized to identify altered biological processes and signaling pathways. RESULTS: Comparative analysis of differentially accumulated proteins between groups revealed common signatures between three groups of patients with inflammation (responder-PrePPI, non-responder-PrePPI, and non-responder-PostPPI) and without inflammation (controls and responder-PostPPI). PPI therapy almost reversed the EoE specific esophageal protein signature, which is enriched in pathways associated with inflammation and epithelial barrier function, in responder-PostPPI. Furthermore, we identified a set of candidate proteins to differentiate responder-PrePPI and non-responder-PrePPI EoE patients before treatment. CONCLUSION: These findings provide evidence that PPI therapy reverses the alterations in esophageal inflammatory and epithelial proteins characterizing EoE, thereby providing new insights into the mechanism of PPI clinical response. Interestingly, our results also suggest that PPI response could be predicted at baseline in EoE.

5.
BMC Med Inform Decis Mak ; 22(Suppl 2): 348, 2024 Mar 03.
Article in English | MEDLINE | ID: mdl-38433189

ABSTRACT

BACKGROUND: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS: We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS: Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION: Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.


Subject(s)
Lupus Erythematosus, Systemic , Lupus Nephritis , Humans , Lupus Nephritis/diagnosis , Electronic Health Records , Natural Language Processing , Phenotype , Rare Diseases
6.
Circulation ; 145(12): 877-891, 2022 03 22.
Article in English | MEDLINE | ID: mdl-34930020

ABSTRACT

BACKGROUND: Sequencing Mendelian arrhythmia genes in individuals without an indication for arrhythmia genetic testing can identify carriers of pathogenic or likely pathogenic (P/LP) variants. However, the extent to which these variants are associated with clinically meaningful phenotypes before or after return of variant results is unclear. In addition, the majority of discovered variants are currently classified as variants of uncertain significance, limiting clinical actionability. METHODS: The eMERGE-III study (Electronic Medical Records and Genomics Phase III) is a multicenter prospective cohort that included 21 846 participants without previous indication for cardiac genetic testing. Participants were sequenced for 109 Mendelian disease genes, including 10 linked to arrhythmia syndromes. Variant carriers were assessed with electronic health record-derived phenotypes and follow-up clinical examination. Selected variants of uncertain significance (n=50) were characterized in vitro with automated electrophysiology experiments in HEK293 cells. RESULTS: As previously reported, 3.0% of participants had P/LP variants in the 109 genes. Herein, we report 120 participants (0.6%) with P/LP arrhythmia variants. Compared with noncarriers, arrhythmia P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their electronic health records. Fifty-four participants had variant results returned. Nineteen of these 54 participants had inherited arrhythmia syndrome diagnoses (primarily long-QT syndrome), and 12 of these 19 diagnoses were made only after variant results were returned (0.05%). After in vitro functional evaluation of 50 variants of uncertain significance, we reclassified 11 variants: 3 to likely benign and 8 to P/LP. CONCLUSIONS: Genome sequencing in a large population without indication for arrhythmia genetic testing identified phenotype-positive carriers of variants in congenital arrhythmia syndrome disease genes. As the genomes of large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, electronic health record phenotypes, and in vitro functional studies. REGISTRATION: URL: https://www. CLINICALTRIALS: gov; Unique identifier; NCT03394859.


Subject(s)
Arrhythmias, Cardiac , Genetic Testing , Arrhythmias, Cardiac/diagnosis , Arrhythmias, Cardiac/genetics , Genetic Predisposition to Disease , Genetic Testing/methods , Genomics , HEK293 Cells , Humans , Phenotype , Prospective Studies
7.
Am J Hum Genet ; 106(5): 707-716, 2020 05 07.
Article in English | MEDLINE | ID: mdl-32386537

ABSTRACT

Because polygenic risk scores (PRSs) for coronary heart disease (CHD) are derived from mainly European ancestry (EA) cohorts, their validity in African ancestry (AA) and Hispanic ethnicity (HE) individuals is unclear. We investigated associations of "restricted" and genome-wide PRSs with CHD in three major racial and ethnic groups in the U.S. The eMERGE cohort (mean age 48 ± 14 years, 58% female) included 45,645 EA, 7,597 AA, and 2,493 HE individuals. We assessed two restricted PRSs (PRSTikkanen and PRSTada; 28 and 50 variants, respectively) and two genome-wide PRSs (PRSmetaGRS and PRSLDPred; 1.7 M and 6.6 M variants, respectively) derived from EA cohorts. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard and odds ratios for the association of PRSs with CHD were similar in EA and HE cohorts but lower in AA cohorts. Genome-wide PRSs were more strongly associated with CHD than restricted PRSs were. PRSmetaGRS, the best performing PRS, was associated with CHD in all three cohorts; hazard ratios (95% CI) per 1 SD increase were 1.53 (1.46-1.60), 1.53 (1.23-1.90), and 1.27 (1.13-1.43) for incident CHD in EA, HE, and AA individuals, respectively. The hazard ratios were comparable in the EA and HE cohorts (pinteraction = 0.77) but were significantly attenuated in AA individuals (pinteraction= 2.9 × 10-3). These results highlight the potential clinical utility of PRSs for CHD as well as the need to assemble diverse cohorts to generate ancestry- and ethnicity PRSs.


Subject(s)
Black or African American/genetics , Coronary Disease/genetics , Genetic Predisposition to Disease , Hispanic or Latino/genetics , Multifactorial Inheritance/genetics , White People/genetics , Cohort Studies , Female , Humans , Male , Middle Aged , Odds Ratio
8.
Allergy ; 78(10): 2732-2744, 2023 10.
Article in English | MEDLINE | ID: mdl-37287363

ABSTRACT

BACKGROUND: Eosinophilic esophagitis (EoE) is a chronic non-IgE-mediated allergic disease of the esophagus. An unbiased proteomics approach was performed to investigate pathophysiological changes in esophageal epithelium. Additionally, an RNAseq-based transcriptomic analysis in paired samples was also carried out. METHODS: Total proteins were purified from esophageal endoscopic biopsies in a cohort of adult EoE patients (n = 25) and healthy esophagus controls (n = 10). Differentially accumulated (DA) proteins in EoE patients compared to control tissues were characterized to identify altered biological processes and signaling pathways. Results were also compared with a quantitative proteome dataset of the human esophageal mucosa. Next, results were contrasted with those obtained after RNAseq analysis in paired samples. Finally, we matched up protein expression with two EoE-specific mRNA panels (EDP and Eso-EoE panel). RESULTS: A total of 1667 proteins were identified, of which 363 were DA in EoE. RNA sequencing in paired samples identified 1993 differentially expressed (DE) genes. Total RNA and protein levels positively correlated, especially in DE mRNA-proteins pairs. Pathway analysis of these proteins in EoE showed alterations in immune and inflammatory responses for the upregulated proteins, and in epithelial differentiation, cornification and keratinization in those downregulated. Interestingly, a set of DA proteins, including eosinophil-related and secreted proteins, were not detected at the mRNA level. Protein expression positively correlated with EDP and Eso-EoE, and corresponded with the most abundant proteins of the human esophageal proteome. CONCLUSIONS: We unraveled for the first time key proteomic features involved in EoE pathogenesis. An integrative analysis of transcriptomic and proteomic datasets provides a deeper insight than transcriptomic alone into understanding complex disease mechanisms.


Subject(s)
Eosinophilic Esophagitis , Adult , Humans , Eosinophilic Esophagitis/pathology , Esophageal Mucosa/metabolism , Proteome , Proteomics , RNA, Messenger/genetics , Epithelium/pathology
9.
J Biomed Inform ; 144: 104442, 2023 08.
Article in English | MEDLINE | ID: mdl-37429512

ABSTRACT

OBJECTIVE: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION: The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Humans , Alzheimer Disease/diagnosis , Cognitive Dysfunction/diagnosis , Disease Progression
10.
Pharmacogenet Genomics ; 32(1): 1-9, 2022 01 01.
Article in English | MEDLINE | ID: mdl-34380996

ABSTRACT

OBJECTIVES: Primary nonresponse (PNR) to antitumor necrosis factor-α (TNFα) biologics is a serious concern in patients with inflammatory bowel disease (IBD). We aimed to identify the genetic variants associated with PNR. PATIENTS AND METHODS: Patients were recruited from outpatient GI clinics and PNR was determined using both clinical and endoscopic findings. A case-control genome-wide association study was performed in 589 IBD patients and associations were replicated in an independent cohort of 293 patients. Effect of the associated variant on gene expression and TNFα secretion was assessed by cell-based assays. Pleiotropic effects were investigated by Phenome-wide association study (PheWAS). RESULTS: We identified rs34767465 as associated with PNR to anti-TNFα therapy (odds ratio: 2.07, 95% CI, 1.46-2.94, P = 2.43 × 10-7, [replication odds ratio: 1.8, 95% CI, 1.04-3.16, P = 0.03]). rs34767465 is a multiple-tissue expression quantitative trait loci for FAM114A2. Using RNA-sequencing and protein quantification from HapMap lymphoblastoid cell lines (LCLs), we found a significant decrease in FAM114A2 mRNA and protein expression in both heterozygous and homozygous genotypes when compared to wild type LCLs. TNFα secretion was significantly higher in THP-1 cells [differentiated into macrophages] with FAM114A2 knockdown versus controls. Immunoblotting experiments showed that depletion of FAM114A2 impaired autophagy-related pathway genes suggesting autophagy-mediated TNFα secretion as a potential mechanism. PheWAS showed rs34767465 was associated with comorbid conditions found in IBD patients (derangement of joints [P = 3.7 × 10-4], pigmentary iris degeneration [P = 5.9 × 10-4], diverticulum of esophagus [P = 7 × 10-4]). CONCLUSIONS: We identified a variant rs34767465 associated with PNR to anti-TNFα biologics, which increases TNFα secretion through mechanism related to autophagy. rs34767465 may also explain the comorbidities associated with IBD.


Subject(s)
Genome-Wide Association Study , Inflammatory Bowel Diseases , Case-Control Studies , Cohort Studies , Humans , Inflammatory Bowel Diseases/drug therapy , Inflammatory Bowel Diseases/genetics , Tumor Necrosis Factor-alpha/genetics
11.
J Child Psychol Psychiatry ; 63(4): 360-376, 2022 04.
Article in English | MEDLINE | ID: mdl-34979592

ABSTRACT

The National Institute of Mental Health (NIMH) proposed the Research Domain Criteria (RDoC) initiative as an alternate way to organize research of mental illnesses, by looking at dimensions of functioning rather than being tied to categorical diagnoses. This paper briefly discusses the motivation for and organization of RDoC, and then explores the NIMH portfolio and recent work to monitor the utility and progress that RDoC has afforded developmental research. To examine how RDoC has influenced the NIMH developmental research portfolio over the last decade, we employed a natural language processing algorithm to identify the number of developmental science grants classified as incorporating an RDoC approach. Additional portfolio analyses examine temporal trends in funded RDoC-relevant grants, publications and citations, and research training opportunities. Reflecting on how RDoC has influenced the focus of grant applications, we highlight examples from research on Attention-Deficit Hyperactivity Disorder (ADHD), childhood irritability, and Autism Spectrum Disorder (ASD). Lastly, we consider how the dimensional and transdiagnostic approaches emphasized in RDoC have facilitated research on personalized intervention for heterogeneous disorders and preventive/early interventions targeting emergent or subthreshold psychopathology.


Subject(s)
Attention Deficit Disorder with Hyperactivity , Autism Spectrum Disorder , Mental Disorders , Attention Deficit Disorder with Hyperactivity/therapy , Autism Spectrum Disorder/therapy , Child , Humans , Mental Disorders/diagnosis , Mental Disorders/therapy , National Institute of Mental Health (U.S.) , Psychopathology , United States
12.
BMC Med Inform Decis Mak ; 22(1): 23, 2022 01 28.
Article in English | MEDLINE | ID: mdl-35090449

ABSTRACT

INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness. METHODS: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm. RESULTS: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites. DISCUSSION AND CONCLUSION: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.


Subject(s)
Algorithms , Electronic Health Records , Genomics , Humans , Knowledge Bases , Phenotype
13.
Circulation ; 142(17): 1633-1646, 2020 10 27.
Article in English | MEDLINE | ID: mdl-32981348

ABSTRACT

BACKGROUND: Abdominal aortic aneurysm (AAA) is an important cause of cardiovascular mortality; however, its genetic determinants remain incompletely defined. In total, 10 previously identified risk loci explain a small fraction of AAA heritability. METHODS: We performed a genome-wide association study in the Million Veteran Program testing ≈18 million DNA sequence variants with AAA (7642 cases and 172 172 controls) in veterans of European ancestry with independent replication in up to 4972 cases and 99 858 controls. We then used mendelian randomization to examine the causal effects of blood pressure on AAA. We examined the association of AAA risk variants with aneurysms in the lower extremity, cerebral, and iliac arterial beds, and derived a genome-wide polygenic risk score (PRS) to identify a subset of the population at greater risk for disease. RESULTS: Through a genome-wide association study, we identified 14 novel loci, bringing the total number of known significant AAA loci to 24. In our mendelian randomization analysis, we demonstrate that a genetic increase of 10 mm Hg in diastolic blood pressure (odds ratio, 1.43 [95% CI, 1.24-1.66]; P=1.6×10-6), as opposed to systolic blood pressure (odds ratio, 1.06 [95% CI, 0.97-1.15]; P=0.2), likely has a causal relationship with AAA development. We observed that 19 of 24 AAA risk variants associate with aneurysms in at least 1 other vascular territory. A 29-variant PRS was strongly associated with AAA (odds ratioPRS, 1.26 [95% CI, 1.18-1.36]; PPRS=2.7×10-11 per SD increase in PRS), independent of family history and smoking risk factors (odds ratioPRS+family history+smoking, 1.24 [95% CI, 1.14-1.35]; PPRS=1.27×10-6). Using this PRS, we identified a subset of the population with AAA prevalence greater than that observed in screening trials informing current guidelines. CONCLUSIONS: We identify novel AAA genetic associations with therapeutic implications and identify a subset of the population at significantly increased genetic risk of AAA independent of family history. Our data suggest that extending current screening guidelines to include testing to identify those with high polygenic AAA risk, once the cost of genotyping becomes comparable with that of screening ultrasound, would significantly increase the yield of current screening at reasonable cost.


Subject(s)
Aortic Aneurysm, Abdominal/genetics , Humans , Veterans
14.
J Biomed Inform ; 102: 103361, 2020 02.
Article in English | MEDLINE | ID: mdl-31911172

ABSTRACT

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03±17.25 years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) 1.55±0.34 mg/dL, estimated Glomerular Filtration Rate Test (eGFR) 107.65±54.98 mL/min/1.73 m2). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81±10.43 years, and was characterized by severe loss of kidney excretory function (SCr 1.96±0.49 mg/dL, eGFR 82.19±55.92 mL/min/1.73 m2). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07±11.32 years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr 1.69±0.32 mg/dL, eGFR 93.97±56.53 mL/min/1.73 m2). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.


Subject(s)
Acute Kidney Injury , Electronic Health Records , Acute Kidney Injury/diagnosis , Aged , Creatinine , Glomerular Filtration Rate , Humans , Middle Aged , Phenotype
15.
Circulation ; 138(17): 1839-1849, 2018 10 23.
Article in English | MEDLINE | ID: mdl-29703846

ABSTRACT

BACKGROUND: Coronary heart disease (CHD) is a leading cause of death globally. Although therapy with statins decreases circulating levels of low-density lipoprotein cholesterol and the incidence of CHD, additional events occur despite statin therapy in some individuals. The genetic determinants of this residual cardiovascular risk remain unknown. METHODS: We performed a 2-stage genome-wide association study of CHD events during statin therapy. We first identified 3099 cases who experienced CHD events (defined as acute myocardial infarction or the need for coronary revascularization) during statin therapy and 7681 controls without CHD events during comparable intensity and duration of statin therapy from 4 sites in the Electronic Medical Records and Genomics Network. We then sought replication of candidate variants in another 160 cases and 1112 controls from a fifth Electronic Medical Records and Genomics site, which joined the network after the initial genome-wide association study. Finally, we performed a phenome-wide association study for other traits linked to the most significant locus. RESULTS: The meta-analysis identified 7 single nucleotide polymorphisms at a genome-wide level of significance within the LPA/PLG locus associated with CHD events on statin treatment. The most significant association was for an intronic single nucleotide polymorphism within LPA/PLG (rs10455872; minor allele frequency, 0.069; odds ratio, 1.58; 95% confidence interval, 1.35-1.86; P=2.6×10-10). In the replication cohort, rs10455872 was also associated with CHD events (odds ratio, 1.71; 95% confidence interval, 1.14-2.57; P=0.009). The association of this single nucleotide polymorphism with CHD events was independent of statin-induced change in low-density lipoprotein cholesterol (odds ratio, 1.62; 95% confidence interval, 1.17-2.24; P=0.004) and persisted in individuals with low-density lipoprotein cholesterol ≤70 mg/dL (odds ratio, 2.43; 95% confidence interval, 1.18-4.75; P=0.015). A phenome-wide association study supported the effect of this region on coronary heart disease and did not identify noncardiovascular phenotypes. CONCLUSIONS: Genetic variations at the LPA locus are associated with CHD events during statin therapy independently of the extent of low-density lipoprotein cholesterol lowering. This finding provides support for exploring strategies targeting circulating concentrations of lipoprotein(a) to reduce CHD events in patients receiving statins.


Subject(s)
Coronary Disease/genetics , Coronary Disease/prevention & control , Dyslipidemias/drug therapy , Dyslipidemias/genetics , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Lipoprotein(a)/genetics , Polymorphism, Single Nucleotide , Case-Control Studies , Coronary Disease/blood , Coronary Disease/diagnosis , Databases, Genetic , Dyslipidemias/blood , Dyslipidemias/diagnosis , Electronic Health Records , Gene Frequency , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , Phenotype , Risk Assessment , Risk Factors , Time Factors , Treatment Outcome
16.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Article in English | MEDLINE | ID: mdl-30571344

ABSTRACT

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Subject(s)
Biomarkers/blood , Carotid Artery Diseases/diagnosis , Genome-Wide Association Study , Proteome/analysis , Adult , Aged , Aged, 80 and over , Carotid Artery Diseases/genetics , Female , Genotype , Humans , Lectins, C-Type/analysis , Male , Middle Aged , Odds Ratio , Phenotype , Polymorphism, Single Nucleotide , Proteomics , Receptor, Platelet-Derived Growth Factor beta/blood
17.
BMC Med ; 17(1): 135, 2019 07 17.
Article in English | MEDLINE | ID: mdl-31311600

ABSTRACT

BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition. METHODS: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI). RESULTS: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10- 20). This effect was consistent in both pediatric (p = 9.92 × 10- 6) and adult (p = 9.73 × 10- 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10- 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10- 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10- 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10- 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses. CONCLUSIONS: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.


Subject(s)
Non-alcoholic Fatty Liver Disease/genetics , Adult , Aged , Body Mass Index , Case-Control Studies , Community Networks/organization & administration , Community Networks/statistics & numerical data , Disease Progression , Electronic Health Records/organization & administration , Electronic Health Records/statistics & numerical data , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Genomics/organization & administration , Genomics/statistics & numerical data , Humans , Lipase/genetics , Male , Membrane Proteins/genetics , Middle Aged , Morbidity , Non-alcoholic Fatty Liver Disease/epidemiology , Phenotype , Polymorphism, Single Nucleotide , Signal Transduction/genetics
18.
Circ Res ; 120(2): 341-353, 2017 Jan 20.
Article in English | MEDLINE | ID: mdl-27899403

ABSTRACT

RATIONALE: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. OBJECTIVE: To identify additional AAA risk loci using data from all available genome-wide association studies. METHODS AND RESULTS: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. CONCLUSIONS: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.


Subject(s)
Aortic Aneurysm, Abdominal/diagnosis , Aortic Aneurysm, Abdominal/genetics , Genetic Loci/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Aortic Aneurysm, Abdominal/epidemiology , Genetic Predisposition to Disease/epidemiology , Genetic Variation/genetics , Genome-Wide Association Study/trends , Humans
19.
J Biomed Inform ; 99: 103310, 2019 11.
Article in English | MEDLINE | ID: mdl-31622801

ABSTRACT

BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.


Subject(s)
Electronic Health Records/classification , Health Information Interoperability , Obesity/epidemiology , Patient Discharge , Adult , Algorithms , Body Mass Index , Comorbidity , Female , Humans , Machine Learning , Male , Phenotype , Software
20.
J Biomed Inform ; 96: 103253, 2019 08.
Article in English | MEDLINE | ID: mdl-31325501

ABSTRACT

BACKGROUND: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.


Subject(s)
Attention Deficit Disorder with Hyperactivity/diagnosis , Databases, Factual , Diabetes Mellitus, Type 2/diagnosis , Electronic Health Records , Data Collection , Humans , Medical Informatics , National Human Genome Research Institute (U.S.) , Observational Studies as Topic , Outcome Assessment, Health Care , Phenotype , Research Design , Software , United States
SELECTION OF CITATIONS
SEARCH DETAIL