ABSTRACT
Circulating proteins have important functions in inflammation and a broad range of diseases. To identify genetic influences on inflammation-related proteins, we conducted a genome-wide protein quantitative trait locus (pQTL) study of 91 plasma proteins measured using the Olink Target platform in 14,824 participants. We identified 180 pQTLs (59 cis, 121 trans). Integration of pQTL data with eQTL and disease genome-wide association studies provided insight into pathogenesis, implicating lymphotoxin-α in multiple sclerosis. Using Mendelian randomization (MR) to assess causality in disease etiology, we identified both shared and distinct effects of specific proteins across immune-mediated diseases, including directionally discordant effects of CD40 on risk of rheumatoid arthritis versus multiple sclerosis and inflammatory bowel disease. MR implicated CXCL5 in the etiology of ulcerative colitis (UC) and we show elevated gut CXCL5 transcript expression in patients with UC. These results identify targets of existing drugs and provide a powerful resource to facilitate future drug target prioritization.
Subject(s)
Colitis, Ulcerative , Inflammatory Bowel Diseases , Multiple Sclerosis , Humans , Genome-Wide Association Study , Inflammatory Bowel Diseases/genetics , Quantitative Trait Loci , Colitis, Ulcerative/drug therapy , Colitis, Ulcerative/genetics , Inflammation/genetics , Multiple Sclerosis/genetics , Polymorphism, Single NucleotideABSTRACT
Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
Subject(s)
Gene Expression Regulation , Humans , Sequence Analysis, RNA , Genetic Variation , Genomic Structural Variation/genetics , Transcriptome/genetics , Blood DonorsABSTRACT
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm's greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Subject(s)
Carbon Footprint , Computational Biology , Algorithms , Genome-Wide Association Study , SoftwareABSTRACT
BACKGROUND: Uromodulin, the most abundant protein excreted in normal urine, plays major roles in kidney physiology and disease. The mechanisms regulating the urinary excretion of uromodulin remain essentially unknown. METHODS: We conducted a meta-analysis of genome-wide association studies for raw (uUMOD) and indexed to creatinine (uUCR) urinary levels of uromodulin in 29,315 individuals of European ancestry from 13 cohorts. We tested the distribution of candidate genes in kidney segments and investigated the effects of keratin-40 (KRT40) on uromodulin processing. RESULTS: Two genome-wide significant signals were identified for uUMOD: a novel locus (P 1.24E-08) over the KRT40 gene coding for KRT40, a type 1 keratin expressed in the kidney, and the UMOD-PDILT locus (P 2.17E-88), with two independent sets of single nucleotide polymorphisms spread over UMOD and PDILT. Two genome-wide significant signals for uUCR were identified at the UMOD-PDILT locus and at the novel WDR72 locus previously associated with kidney function. The effect sizes for rs8067385, the index single nucleotide polymorphism in the KRT40 locus, were similar for both uUMOD and uUCR. KRT40 colocalized with uromodulin and modulating its expression in thick ascending limb (TAL) cells affected uromodulin processing and excretion. CONCLUSIONS: Common variants in KRT40, WDR72, UMOD, and PDILT associate with the levels of uromodulin in urine. The expression of KRT40 affects uromodulin processing in TAL cells. These results, although limited by lack of replication, provide insights into the biology of uromodulin, the role of keratins in the kidney, and the influence of the UMOD-PDILT locus on kidney function.
Subject(s)
Genome-Wide Association Study , Kidney , Creatinine , Humans , Polymorphism, Single Nucleotide , Protein Disulfide-Isomerases/genetics , Uromodulin/geneticsABSTRACT
Educational attainment is widely used as a surrogate for socioeconomic status (SES). Low SES is a risk factor for hypertension and high blood pressure (BP). To identify novel BP loci, we performed multi-ancestry meta-analyses accounting for gene-educational attainment interactions using two variables, "Some College" (yes/no) and "Graduated College" (yes/no). Interactions were evaluated using both a 1 degree of freedom (DF) interaction term and a 2DF joint test of genetic and interaction effects. Analyses were performed for systolic BP, diastolic BP, mean arterial pressure, and pulse pressure. We pursued genome-wide interrogation in Stage 1 studies (N = 117 438) and follow-up on promising variants in Stage 2 studies (N = 293 787) in five ancestry groups. Through combined meta-analyses of Stages 1 and 2, we identified 84 known and 18 novel BP loci at genome-wide significance level (P < 5 × 10-8). Two novel loci were identified based on the 1DF test of interaction with educational attainment, while the remaining 16 loci were identified through the 2DF joint test of genetic and interaction effects. Ten novel loci were identified in individuals of African ancestry. Several novel loci show strong biological plausibility since they involve physiologic systems implicated in BP regulation. They include genes involved in the central nervous system-adrenal signaling axis (ZDHHC17, CADPS, PIK3C2G), vascular structure and function (GNB3, CDON), and renal function (HAS2 and HAS2-AS1, SLIT3). Collectively, these findings suggest a role of educational attainment or SES in further dissection of the genetic architecture of BP.
Subject(s)
Genome-Wide Association Study , Hypertension , Blood Pressure/genetics , Epistasis, Genetic , Genetic Loci , Humans , Hypertension/genetics , Polymorphism, Single NucleotideABSTRACT
Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.
Subject(s)
Brain/metabolism , Educational Status , Fetus/metabolism , Gene Expression Regulation/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics , Alzheimer Disease/genetics , Bipolar Disorder/genetics , Cognition , Computational Biology , Gene-Environment Interaction , Humans , Molecular Sequence Annotation , Schizophrenia/genetics , United KingdomABSTRACT
Elevated blood pressure (BP), a leading cause of global morbidity and mortality, is influenced by both genetic and lifestyle factors. Cigarette smoking is one such lifestyle factor. Across five ancestries, we performed a genome-wide gene-smoking interaction study of mean arterial pressure (MAP) and pulse pressure (PP) in 129 913 individuals in stage 1 and follow-up analysis in 480 178 additional individuals in stage 2. We report here 136 loci significantly associated with MAP and/or PP. Of these, 61 were previously published through main-effect analysis of BP traits, 37 were recently reported by us for systolic BP and/or diastolic BP through gene-smoking interaction analysis and 38 were newly identified (P < 5 × 10-8, false discovery rate < 0.05). We also identified nine new signals near known loci. Of the 136 loci, 8 showed significant interaction with smoking status. They include CSMD1 previously reported for insulin resistance and BP in the spontaneously hypertensive rats. Many of the 38 new loci show biologic plausibility for a role in BP regulation. SLC26A7 encodes a chloride/bicarbonate exchanger expressed in the renal outer medullary collecting duct. AVPR1A is widely expressed, including in vascular smooth muscle cells, kidney, myocardium and brain. FHAD1 is a long non-coding RNA overexpressed in heart failure. TMEM51 was associated with contractile function in cardiomyocytes. CASP9 plays a central role in cardiomyocyte apoptosis. Identified only in African ancestry were 30 novel loci. Our findings highlight the value of multi-ancestry investigations, particularly in studies of interaction with lifestyle factors, where genomic and lifestyle differences may contribute to novel findings.
Subject(s)
Arterial Pressure/genetics , Gene-Environment Interaction , Hypertension/genetics , Polymorphism, Genetic , Racial Groups/genetics , Smoking/adverse effects , Adolescent , Adult , Aged , Aged, 80 and over , Antiporters/genetics , Blood Pressure/genetics , Caspase 9/genetics , Ethnicity/genetics , Female , Genome-Wide Association Study , Humans , Hypertension/etiology , Male , Membrane Proteins/genetics , Middle Aged , Receptors, Vasopressin/genetics , Sulfate Transporters/genetics , Tumor Suppressor Proteins/genetics , Young AdultABSTRACT
Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke. Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture. We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals. Stage 1 analysis examined â¼18.8 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries. We identified 15 loci that were genome-wide significant (p < 5 × 10-8) in stage 1 and formally replicated in stage 2. A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively). A total of 56 known BP loci were also identified by our results (p < 5 × 10-8). Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2. Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations. The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits. They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2).
Subject(s)
Blood Pressure/genetics , Genetic Loci , Genome-Wide Association Study , Racial Groups/genetics , Smoking/genetics , Cohort Studies , Diastole/genetics , Epistasis, Genetic , Female , Humans , Male , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Reproducibility of Results , Systole/geneticsABSTRACT
Factor VII (FVII) is an important component of the coagulation cascade. Few genetic loci regulating FVII activity and/or levels have been discovered to date. We conducted a meta-analysis of 9 genome-wide association studies of plasma FVII levels (7 FVII activity and 2 FVII antigen) among 27 495 participants of European and African ancestry. Each study performed ancestry-specific association analyses. Inverse variance weighted meta-analysis was performed within each ancestry group and then combined for a trans-ancestry meta-analysis. Our primary analysis included the 7 studies that measured FVII activity, and a secondary analysis included all 9 studies. We provided functional genomic validation for newly identified significant loci by silencing candidate genes in a human liver cell line (HuH7) using small-interfering RNA and then measuring F7 messenger RNA and FVII protein expression. Lastly, we used meta-analysis results to perform Mendelian randomization analysis to estimate the causal effect of FVII activity on coronary artery disease, ischemic stroke (IS), and venous thromboembolism. We identified 2 novel (REEP3 and JAZF1-AS1) and 6 known loci associated with FVII activity, explaining 19.0% of the phenotypic variance. Adding FVII antigen data to the meta-analysis did not result in the discovery of further loci. Silencing REEP3 in HuH7 cells upregulated FVII, whereas silencing JAZF1 downregulated FVII. Mendelian randomization analyses suggest that FVII activity has a positive causal effect on the risk of IS. Variants at REEP3 and JAZF1 contribute to FVII activity by regulating F7 expression levels. FVII activity appears to contribute to the etiology of IS in the general population.
Subject(s)
Brain Ischemia/etiology , Factor VII/genetics , Genome-Wide Association Study , Membrane Transport Proteins/genetics , Neoplasm Proteins/genetics , Polymorphism, Single Nucleotide , Stroke/etiology , Brain Ischemia/metabolism , Brain Ischemia/pathology , Co-Repressor Proteins , Cohort Studies , Coronary Artery Disease/etiology , Coronary Artery Disease/metabolism , Coronary Artery Disease/pathology , DNA-Binding Proteins , Factor VII/metabolism , Female , Follow-Up Studies , Genetic Loci , Genetic Predisposition to Disease , Humans , Male , Membrane Transport Proteins/metabolism , Mendelian Randomization Analysis , Middle Aged , Neoplasm Proteins/metabolism , Phenotype , Prognosis , Stroke/metabolism , Stroke/pathology , Venous Thromboembolism/etiology , Venous Thromboembolism/metabolism , Venous Thromboembolism/pathologyABSTRACT
BACKGROUND: Factor VIII (FVIII) and its carrier protein von Willebrand factor (VWF) are associated with risk of arterial and venous thrombosis and with hemorrhagic disorders. We aimed to identify and functionally test novel genetic associations regulating plasma FVIII and VWF. METHODS: We meta-analyzed genome-wide association results from 46 354 individuals of European, African, East Asian, and Hispanic ancestry. All studies performed linear regression analysis using an additive genetic model and associated ≈35 million imputed variants with natural log-transformed phenotype levels. In vitro gene silencing in cultured endothelial cells was performed for candidate genes to provide additional evidence on association and function. Two-sample Mendelian randomization analyses were applied to test the causal role of FVIII and VWF plasma levels on the risk of arterial and venous thrombotic events. RESULTS: We identified 13 novel genome-wide significant ( P≤2.5×10-8) associations, 7 with FVIII levels ( FCHO2/TMEM171/TNPO1, HLA, SOX17/RP1, LINC00583/NFIB, RAB5C-KAT2A, RPL3/TAB1/SYNGR1, and ARSA) and 11 with VWF levels ( PDHB/PXK/KCTD6, SLC39A8, FCHO2/TMEM171/TNPO1, HLA, GIMAP7/GIMAP4, OR13C5/NIPSNAP, DAB2IP, C2CD4B, RAB5C-KAT2A, TAB1/SYNGR1, and ARSA), beyond 10 previously reported associations with these phenotypes. Functional validation provided further evidence of association for all loci on VWF except ARSA and DAB2IP. Mendelian randomization suggested causal effects of plasma FVIII activity levels on venous thrombosis and coronary artery disease risk and plasma VWF levels on ischemic stroke risk. CONCLUSIONS: The meta-analysis identified 13 novel genetic loci regulating FVIII and VWF plasma levels, 10 of which we validated functionally. We provide some evidence for a causal role of these proteins in thrombotic events.
Subject(s)
Arterial Occlusive Diseases/genetics , Blood Coagulation Disorders, Inherited/genetics , Blood Coagulation/genetics , Factor VIII/analysis , Genetic Loci , Venous Thrombosis/genetics , von Willebrand Factor/analysis , Arterial Occlusive Diseases/blood , Arterial Occlusive Diseases/ethnology , Biomarkers/blood , Blood Coagulation Disorders, Inherited/blood , Blood Coagulation Disorders, Inherited/ethnology , Genetic Markers , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Mendelian Randomization Analysis , Phenotype , Ribosomal Protein L3 , Risk Factors , Venous Thrombosis/blood , Venous Thrombosis/ethnologyABSTRACT
Resting heart rate is a heritable trait, and an increase in heart rate is associated with increased mortality risk. Genome-wide association study analyses have found loci associated with resting heart rate, at the time of our study these loci explained 0.9% of the variation. This study aims to discover new genetic loci associated with heart rate from Exome Chip meta-analyses.Heart rate was measured from either elecrtrocardiograms or pulse recordings. We meta-analysed heart rate association results from 104 452 European-ancestry individuals from 30 cohorts, genotyped using the Exome Chip. Twenty-four variants were selected for follow-up in an independent dataset (UK Biobank, N = 134 251). Conditional and gene-based testing was undertaken, and variants were investigated with bioinformatics methods.We discovered five novel heart rate loci, and one new independent low-frequency non-synonymous variant in an established heart rate locus (KIAA1755). Lead variants in four of the novel loci are non-synonymous variants in the genes C10orf71, DALDR3, TESK2 and SEC31B. The variant at SEC31B is significantly associated with SEC31B expression in heart and tibial nerve tissue. Further candidate genes were detected from long-range regulatory chromatin interactions in heart tissue (SCD, SLF2 and MAPK8). We observed significant enrichment in DNase I hypersensitive sites in fetal heart and lung. Moreover, enrichment was seen for the first time in human neuronal progenitor cells (derived from embryonic stem cells) and fetal muscle samples by including our novel variants.Our findings advance the knowledge of the genetic architecture of heart rate, and indicate new candidate genes for follow-up functional studies.
Subject(s)
Heart Rate/genetics , Adult , Alleles , Exome , Female , Gene Frequency/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Genotype , Heart Rate/physiology , Humans , Male , Middle Aged , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide/genetics , Risk Factors , White People/geneticsABSTRACT
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta-analysis has emerged as the method of choice to combine results from multiple studies. Many meta-analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta-analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two-stage meta-analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta-analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype-specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type-I error rate, and our approach is more powerful than inverse variance weighted meta-analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose-associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.
Subject(s)
Genetic Association Studies , Haplotypes/genetics , Meta-Analysis as Topic , Aging , Co-Repressor Proteins , Cohort Studies , DNA-Binding Proteins , Fasting/metabolism , Female , Genetic Variation/genetics , Glucose/metabolism , Glucose-6-Phosphatase/genetics , Heart , Humans , Least-Squares Analysis , Male , Models, Genetic , Molecular Epidemiology , Multivariate Analysis , Neoplasm Proteins/genetics , Phenotype , Reproducibility of Results , Research DesignABSTRACT
Studying gene-environment (G × E) interactions is important, as they extend our knowledge of the genetic architecture of complex traits and may help to identify novel variants not detected via analysis of main effects alone. The main statistical framework for studying G × E interactions uses a single regression model that includes both the genetic main and G × E interaction effects (the "joint" framework). The alternative "stratified" framework combines results from genetic main-effect analyses carried out separately within the exposed and unexposed groups. Although there have been several investigations using theory and simulation, an empirical comparison of the two frameworks is lacking. Here, we compare the two frameworks using results from genome-wide association studies of systolic blood pressure for 3.2 million low frequency and 6.5 million common variants across 20 cohorts of European ancestry, comprising 79,731 individuals. Our cohorts have sample sizes ranging from 456 to 22,983 and include both family-based and population-based samples. In cohort-specific analyses, the two frameworks provided similar inference for population-based cohorts. The agreement was reduced for family-based cohorts. In meta-analyses, agreement between the two frameworks was less than that observed in cohort-specific analyses, despite the increased sample size. In meta-analyses, agreement depended on (1) the minor allele frequency, (2) inclusion of family-based cohorts in meta-analysis, and (3) filtering scheme. The stratified framework appears to approximate the joint framework well only for common variants in population-based cohorts. We conclude that the joint framework is the preferred approach and should be used to control false positives when dealing with low-frequency variants and/or family-based cohorts.
Subject(s)
Blood Pressure/genetics , Gene-Environment Interaction , Smoking , Cohort Studies , Databases, Factual , Family , Gene Frequency , Genome-Wide Association Study , Genotype , Humans , PhenotypeABSTRACT
BACKGROUND: So far, more than 170 loci have been associated with circulating lipid levels through genome-wide association studies (GWAS). These associations are largely driven by common variants, their function is often not known, and many are likely to be markers for the causal variants. In this study we aimed to identify more new rare and low-frequency functional variants associated with circulating lipid levels. METHODS: We used the 1000 Genomes Project as a reference panel for the imputations of GWAS data from â¼60â 000 individuals in the discovery stage and â¼90â 000 samples in the replication stage. RESULTS: Our study resulted in the identification of five new associations with circulating lipid levels at four loci. All four loci are within genes that can be linked biologically to lipid metabolism. One of the variants, rs116843064, is a damaging missense variant within the ANGPTL4 gene. CONCLUSIONS: This study illustrates that GWAS with high-scale imputation may still help us unravel the biological mechanism behind circulating lipid levels.
Subject(s)
Angiopoietins/genetics , Exons/genetics , Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Angiopoietin-Like Protein 4 , Fasting/physiology , Female , Genome-Wide Association Study/methods , Genotype , Humans , Male , Middle AgedABSTRACT
Introduction: Educational attainment, widely used in epidemiologic studies as a surrogate for socioeconomic status, is a predictor of cardiovascular health outcomes. Methods: A two-stage genome-wide meta-analysis of low-density lipoprotein cholesterol (LDL), high-density lipoprotein cholesterol (HDL), and triglyceride (TG) levels was performed while accounting for gene-educational attainment interactions in up to 226,315 individuals from five population groups. We considered two educational attainment variables: "Some College" (yes/no, for any education beyond high school) and "Graduated College" (yes/no, for completing a 4-year college degree). Genome-wide significant (p < 5 × 10-8) and suggestive (p < 1 × 10-6) variants were identified in Stage 1 (in up to 108,784 individuals) through genome-wide analysis, and those variants were followed up in Stage 2 studies (in up to 117,531 individuals). Results: In combined analysis of Stages 1 and 2, we identified 18 novel lipid loci (nine for LDL, seven for HDL, and two for TG) by two degree-of-freedom (2 DF) joint tests of main and interaction effects. Four loci showed significant interaction with educational attainment. Two loci were significant only in cross-population analyses. Several loci include genes with known or suggested roles in adipose (FOXP1, MBOAT4, SKP2, STIM1, STX4), brain (BRI3, FILIP1, FOXP1, LINC00290, LMTK2, MBOAT4, MYO6, SENP6, SRGAP3, STIM1, TMEM167A, TMEM30A), and liver (BRI3, FOXP1) biology, highlighting the potential importance of brain-adipose-liver communication in the regulation of lipid metabolism. An investigation of the potential druggability of genes in identified loci resulted in five gene targets shown to interact with drugs approved by the Food and Drug Administration, including genes with roles in adipose and brain tissue. Discussion: Genome-wide interaction analysis of educational attainment identified novel lipid loci not previously detected by analyses limited to main genetic effects.
ABSTRACT
Background: Genome-wide association studies for glycemic traits have identified hundreds of loci associated with these biomarkers of glucose homeostasis. Despite this success, the challenge remains to link variant associations to genes, and underlying biological pathways. Methods: To identify coding variant associations which may pinpoint effector genes at both novel and previously established genome-wide association loci, we performed meta-analyses of exome-array studies for four glycemic traits: glycated hemoglobin (HbA1c, up to 144,060 participants), fasting glucose (FG, up to 129,665 participants), fasting insulin (FI, up to 104,140) and 2hr glucose post-oral glucose challenge (2hGlu, up to 57,878). In addition, we performed network and pathway analyses. Results: Single-variant and gene-based association analyses identified coding variant associations at more than 60 genes, which when combined with other datasets may be useful to nominate effector genes. Network and pathway analyses identified pathways related to insulin secretion, zinc transport and fatty acid metabolism. HbA1c associations were strongly enriched in pathways related to blood cell biology. Conclusions: Our results provided novel glycemic trait associations and highlighted pathways implicated in glycemic regulation. Exome-array summary statistic results are being made available to the scientific community to enable further discoveries.
ABSTRACT
Cardiometabolic diseases are frequently polygenic in architecture, comprising a large number of risk alleles with small effects spread across the genome1-3. Polygenic scores (PGS) aggregate these into a metric representing an individual's genetic predisposition to disease. PGS have shown promise for early risk prediction4-7 and there is an open question as to whether PGS can also be used to understand disease biology8. Here, we demonstrate that cardiometabolic disease PGS can be used to elucidate the proteins underlying disease pathogenesis. In 3,087 healthy individuals, we found that PGS for coronary artery disease, type 2 diabetes, chronic kidney disease and ischaemic stroke are associated with the levels of 49 plasma proteins. Associations were polygenic in architecture, largely independent of cis and trans protein quantitative trait loci and present for proteins without quantitative trait loci. Over a follow-up of 7.7 years, 28 of these proteins associated with future myocardial infarction or type 2 diabetes events, 16 of which were mediators between polygenic risk and incident disease. Twelve of these were druggable targets with therapeutic potential. Our results demonstrate the potential for PGS to uncover causal disease biology and targets with therapeutic potential, including those that may be missed by approaches utilizing information at a single locus.
Subject(s)
Blood Proteins , Heart Diseases/etiology , Heart Diseases/metabolism , Metabolic Diseases/etiology , Metabolic Diseases/metabolism , Multifactorial Inheritance , Proteome , Adult , Biomarkers , Disease Management , Disease Susceptibility , England/epidemiology , Female , Genetic Predisposition to Disease , Heart Diseases/diagnosis , Heart Diseases/epidemiology , Humans , Male , Metabolic Diseases/diagnosis , Metabolic Diseases/epidemiology , Middle Aged , Public Health Surveillance , Young AdultABSTRACT
The electrocardiographic PR interval reflects atrioventricular conduction, and is associated with conduction abnormalities, pacemaker implantation, atrial fibrillation (AF), and cardiovascular mortality. Here we report a multi-ancestry (N = 293,051) genome-wide association meta-analysis for the PR interval, discovering 202 loci of which 141 have not previously been reported. Variants at identified loci increase the percentage of heritability explained, from 33.5% to 62.6%. We observe enrichment for cardiac muscle developmental/contractile and cytoskeletal genes, highlighting key regulation processes for atrioventricular conduction. Additionally, 8 loci not previously reported harbor genes underlying inherited arrhythmic syndromes and/or cardiomyopathies suggesting a role for these genes in cardiovascular pathology in the general population. We show that polygenic predisposition to PR interval duration is an endophenotype for cardiovascular disease, including distal conduction disease, AF, and atrioventricular pre-excitation. These findings advance our understanding of the polygenic basis of cardiac conduction, and the genetic relationship between PR interval duration and cardiovascular disease.
Subject(s)
Arrhythmias, Cardiac/genetics , Electrocardiography , Genetic Loci/genetics , Genetic Predisposition to Disease/genetics , Arrhythmias, Cardiac/physiopathology , Cardiovascular Diseases/genetics , Cardiovascular Diseases/physiopathology , Endophenotypes , Female , Gene Expression , Genetic Variation , Genome-Wide Association Study , Humans , Male , Multifactorial Inheritance , Quantitative Trait Loci/geneticsABSTRACT
Smoking is a potentially causal behavioral risk factor for type 2 diabetes (T2D), but not all smokers develop T2D. It is unknown whether genetic factors partially explain this variation. We performed genome-environment-wide interaction studies to identify loci exhibiting potential interaction with baseline smoking status (ever vs. never) on incident T2D and fasting glucose (FG). Analyses were performed in participants of European (EA) and African ancestry (AA) separately. Discovery analyses were conducted using genotype data from the 50,000-single-nucleotide polymorphism (SNP) ITMAT-Broad-CARe (IBC) array in 5 cohorts from from the Candidate Gene Association Resource Consortium (n = 23,189). Replication was performed in up to 16 studies from the Cohorts for Heart Aging Research in Genomic Epidemiology Consortium (n = 74,584). In meta-analysis of discovery and replication estimates, 5 SNPs met at least one criterion for potential interaction with smoking on incident T2D at p<1x10-7 (adjusted for multiple hypothesis-testing with the IBC array). Two SNPs had significant joint effects in the overall model and significant main effects only in one smoking stratum: rs140637 (FBN1) in AA individuals had a significant main effect only among smokers, and rs1444261 (closest gene C2orf63) in EA individuals had a significant main effect only among nonsmokers. Three additional SNPs were identified as having potential interaction by exhibiting a significant main effects only in smokers: rs1801232 (CUBN) in AA individuals, rs12243326 (TCF7L2) in EA individuals, and rs4132670 (TCF7L2) in EA individuals. No SNP met significance for potential interaction with smoking on baseline FG. The identification of these loci provides evidence for genetic interactions with smoking exposure that may explain some of the heterogeneity in the association between smoking and T2D.