RESUMO
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
Assuntos
Doença , Frequência do Gene , Fenótipo , Humanos , Pessoa de Meia-Idade , Doença/genética , Estônia , Finlândia , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Metanálise como Assunto , Reino Unido , População Branca/genéticaRESUMO
Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have so far remained largely underpowered in relation to identifying associations in the rare and low-frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes1. Here we combined whole-exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third being previously unreported. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks enabled us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.
Assuntos
Estudo de Associação Genômica Ampla , Proteínas , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Sequenciamento do ExomaRESUMO
MOTIVATION: Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes 'null' and 'junk' clusters, to provide protection against the detection of spurious clusters. RESULTS: Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster. AVAILABILITY AND IMPLEMENTATION: MR-Clust can be downloaded from https://github.com/cnfoley/mrclust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Análise da Randomização Mendeliana , Causalidade , Análise por Conglomerados , Simulação por Computador , Fatores de RiscoRESUMO
An observational correlation between a suspected risk factor and an outcome does not necessarily imply that interventions on levels of the risk factor will have a causal impact on the outcome (correlation is not causation). If genetic variants associated with the risk factor are also associated with the outcome, then this increases the plausibility that the risk factor is a causal determinant of the outcome. However, if the genetic variants in the analysis do not have a specific biological link to the risk factor, then causal claims can be spurious. We review the Mendelian randomization paradigm for making causal inferences using genetic variants. We consider monogenic analysis, in which genetic variants are taken from a single gene region, and polygenic analysis, which includes variants from multiple regions. We focus on answering two questions: When can Mendelian randomization be used to make reliable causal inferences, and when can it be used to make relevant causal inferences?
Assuntos
Estudo de Associação Genômica Ampla , Causalidade , Humanos , Análise da Randomização Mendeliana , Fatores de RiscoRESUMO
While comorbidity between coronary heart disease (CHD) and depression is evident, it is unclear whether the two diseases have shared underlying mechanisms. We performed a range of analyses in 367,703 unrelated middle-aged participants of European ancestry from UK Biobank, a population-based cohort study, to assess whether comorbidity is primarily due to genetic or environmental factors, and to test whether cardiovascular risk factors and CHD are likely to be causally related to depression using Mendelian randomization. We showed family history of heart disease was associated with a 20% increase in depression risk (95% confidence interval [CI] 16-24%, p < 0.0001), but a genetic risk score that is strongly associated with CHD risk was not associated with depression. An increase of 1 standard deviation in the CHD genetic risk score was associated with 71% higher CHD risk, but 1% higher depression risk (95% CI 0-3%; p = 0.11). Mendelian randomization analyses suggested that triglycerides, interleukin-6 (IL-6), and C-reactive protein (CRP) are likely causal risk factors for depression. The odds ratio for depression per standard deviation increase in genetically-predicted triglycerides was 1.18 (95% CI 1.09-1.27; p = 2 × 10-5); per unit increase in genetically-predicted log-transformed IL-6 was 0.74 (95% CI 0.62-0.89; p = 0.0012); and per unit increase in genetically-predicted log-transformed CRP was 1.18 (95% CI 1.07-1.29; p = 0.0009). Our analyses suggest that comorbidity between depression and CHD arises largely from shared environmental factors. IL-6, CRP and triglycerides are likely to be causally linked with depression, so could be targets for treatment and prevention of depression.
Assuntos
Doença das Coronárias , Depressão , Adulto , Idoso , Proteína C-Reativa/análise , Estudos de Coortes , Doença das Coronárias/sangue , Doença das Coronárias/epidemiologia , Doença das Coronárias/genética , Depressão/sangue , Depressão/epidemiologia , Depressão/genética , Feminino , Humanos , Interleucina-6/sangue , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Razão de Chances , Polimorfismo de Nucleotídeo Único , Fatores de Risco , Triglicerídeos/sangue , Reino Unido/epidemiologiaRESUMO
The circulating proteome offers insights into the biological pathways that underlie disease. Here, we test relationships between 1,468 Olink protein levels and the incidence of 23 age-related diseases and mortality in the UK Biobank (n = 47,600). We report 3,209 associations between 963 protein levels and 21 incident outcomes. Next, protein-based scores (ProteinScores) are developed using penalized Cox regression. When applied to test sets, six ProteinScores improve the area under the curve estimates for the 10-year onset of incident outcomes beyond age, sex and a comprehensive set of 24 lifestyle factors, clinically relevant biomarkers and physical measures. Furthermore, the ProteinScore for type 2 diabetes outperforms a polygenic risk score and HbA1c-a clinical marker used to monitor and diagnose type 2 diabetes. The performance of scores using metabolomic and proteomic features is also compared. These data characterize early proteomic contributions to major age-related diseases, demonstrating the value of the plasma proteome for risk stratification.
Assuntos
Proteínas Sanguíneas , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Biomarcadores/sangue , Proteínas Sanguíneas/metabolismo , Proteínas Sanguíneas/genética , Proteínas Sanguíneas/análise , Diabetes Mellitus Tipo 2/mortalidade , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/genética , Incidência , Proteômica , Biobanco do Reino Unido , Reino Unido/epidemiologiaRESUMO
Understanding how gene-environment interactions (GEIs) influence the circulating proteome could aid in biomarker discovery and validation. The presence of GEIs can be inferred from single nucleotide polymorphisms that associate with phenotypic variability - termed variance quantitative trait loci (vQTLs). Here, vQTL association studies are performed on plasma levels of 1463 proteins in 52,363 UK Biobank participants. A set of 677 independent vQTLs are identified across 568 proteins. They include 67 variants that lack conventional additive main effects on protein levels. Over 1100 GEIs are identified between 101 proteins and 153 environmental exposures. GEI analyses uncover possible mechanisms that explain why 13/67 vQTL-only sites lack corresponding main effects. Additional analyses also highlight how age, sex, epistatic interactions and statistical artefacts may underscore associations between genetic variation and variance heterogeneity. This study establishes the most comprehensive database yet of vQTLs and GEIs for the human proteome.
Assuntos
Bancos de Espécimes Biológicos , Proteínas Sanguíneas , Interação Gene-Ambiente , Polimorfismo de Nucleotídeo Único , Proteoma , Locos de Características Quantitativas , Humanos , Reino Unido , Proteoma/metabolismo , Proteoma/genética , Feminino , Masculino , Proteínas Sanguíneas/metabolismo , Proteínas Sanguíneas/genética , Pessoa de Meia-Idade , Idoso , Adulto , Biomarcadores/sangue , Estudo de Associação Genômica Ampla , Biobanco do Reino UnidoRESUMO
Genetic associations with macroscopic brain structure can provide insights into brain function and disease. However, specific associations with measures of local brain folding are largely under-explored. Here, we conducted large-scale genome- and exome-wide associations of regional cortical sulcal measures derived from magnetic resonance imaging scans of 40,169 individuals in UK Biobank. We discovered 388 regional brain folding associations across 77 genetic loci, with genes in associated loci enriched for expression in the cerebral cortex, neuronal development processes, and differential regulation during early brain development. We integrated brain eQTLs to refine genes for various loci, implicated several genes involved in neurodevelopmental disorders, and highlighted global genetic correlations with neuropsychiatric phenotypes. We provide an interactive 3D visualisation of our summary associations, emphasising added resolution of regional analyses. Our results offer new insights into the genetic architecture of brain folding and provide a resource for future studies of sulcal morphology in health and disease.
Assuntos
Bancos de Espécimes Biológicos , Encéfalo , Encéfalo/diagnóstico por imagem , Córtex Cerebral/anatomia & histologia , Estudo de Associação Genômica Ampla , Humanos , Imageamento por Ressonância Magnética , Reino UnidoRESUMO
Genome-wide association studies (GWAS) have identified thousands of genomic regions affecting complex diseases. The next challenge is to elucidate the causal genes and mechanisms involved. One approach is to use statistical colocalization to assess shared genetic aetiology across multiple related traits (e.g. molecular traits, metabolic pathways and complex diseases) to identify causal pathways, prioritize causal variants and evaluate pleiotropy. We propose HyPrColoc (Hypothesis Prioritisation for multi-trait Colocalization), an efficient deterministic Bayesian algorithm using GWAS summary statistics that can detect colocalization across vast numbers of traits simultaneously (e.g. 100 traits can be jointly analysed in around 1 s). We perform a genome-wide multi-trait colocalization analysis of coronary heart disease (CHD) and fourteen related traits, identifying 43 regions in which CHD colocalized with ≥1 trait, including 5 previously unknown CHD loci. Across the 43 loci, we further integrate gene and protein expression quantitative trait loci to identify candidate causal genes.
Assuntos
Algoritmos , Biologia Computacional/métodos , Doença das Coronárias/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas/genética , Doença das Coronárias/diagnóstico , Genômica/métodos , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Fatores de RiscoRESUMO
There is considerable interest in GIPR agonism to enhance the insulinotropic and extrapancreatic effects of GIP, thereby improving glycemic and weight control in type 2 diabetes (T2D) and obesity. Recent genetic epidemiological evidence has implicated higher GIPR-mediated GIP levels in raising coronary artery disease (CAD) risk, a potential safety concern for GIPR agonism. We therefore aimed to quantitatively assess whether the association between higher GIPR-mediated fasting GIP levels and CAD risk is mediated via GIPR or is instead the result of linkage disequilibrium (LD) confounding between variants at the GIPR locus. Using Bayesian multitrait colocalization, we identified a GIPR missense variant, rs1800437 (G allele; E354), as the putatively causal variant shared among fasting GIP levels, glycemic traits, and adiposity-related traits (posterior probability for colocalization [PPcoloc] > 0.97; PP explained by the candidate variant [PPexplained] = 1) that was independent from a cluster of CAD and lipid traits driven by a known missense variant in APOE (rs7412; distance to E354 â¼770 Kb; R 2 with E354 = 0.004; PPcoloc > 0.99; PPexplained = 1). Further, conditioning the association between E354 and CAD on the residual LD with rs7412, we observed slight attenuation in association, but it remained significant (odds ratio [OR] per copy of E354 after adjustment 1.03; 95% CI 1.02, 1.04; P = 0.003). Instead, E354's association with CAD was completely attenuated when conditioning on an additional established CAD signal, rs1964272 (R 2 with E354 = 0.27), an intronic variant in SNRPD2 (OR for E354 after adjustment for rs1964272: 1.01; 95% CI 0.99, 1.03; P = 0.06). We demonstrate that associations with GIP and anthropometric and glycemic traits are driven by genetic signals distinct from those driving CAD and lipid traits in the GIPR region and that higher E354-mediated fasting GIP levels are not associated with CAD risk. These findings provide evidence that the inclusion of GIPR agonism in dual GIPR/GLP1R agonists could potentiate the protective effect of GLP-1 agonists on diabetes without undue CAD risk, an aspect that has yet to be assessed in clinical trials.
Assuntos
Doenças Cardiovasculares/sangue , Diabetes Mellitus Tipo 2/sangue , Polipeptídeo Inibidor Gástrico/sangue , Predisposição Genética para Doença , Receptores dos Hormônios Gastrointestinais/metabolismo , Adulto , Idoso , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Feminino , Finlândia , Polipeptídeo Inibidor Gástrico/genética , Polipeptídeo Inibidor Gástrico/metabolismo , Variação Genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Receptores dos Hormônios Gastrointestinais/genética , Fatores de Risco , Reino UnidoRESUMO
BACKGROUND: Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. Although the approach has been used in applied investigations, little methodological advice is available on how to design or perform a factorial Mendelian randomization analysis. Previous analyses have employed a 2 × 2 approach, using dichotomized genetic scores to divide the population into four subgroups as in a factorial randomized trial. METHODS: We describe two distinct contexts for factorial Mendelian randomization: investigating interactions between risk factors, and investigating interactions between pharmacological interventions on risk factors. We propose two-stage least squares methods using all available genetic variants and their interactions as instrumental variables, and using continuous genetic scores as instrumental variables rather than dichotomized scores. We illustrate our methods using data from UK Biobank to investigate the interaction between body mass index and alcohol consumption on systolic blood pressure. RESULTS: Simulated and real data show that efficiency is maximized using the full set of interactions between genetic variants as instruments. In the applied example, between 4- and 10-fold improvement in efficiency is demonstrated over the 2 × 2 approach. Analyses using continuous genetic scores are more efficient than those using dichotomized scores. Efficiency is improved by finding genetic variants that divide the population at a natural break in the distribution of the risk factor, or else divide the population into more equal-sized groups. CONCLUSIONS: Previous factorial Mendelian randomization analyses may have been underpowered. Efficiency can be improved by using all genetic variants and their interactions as instrumental variables, rather than the 2 × 2 approach.
Assuntos
Variação Genética , Análise da Randomização Mendeliana , Causalidade , Humanos , Análise dos Mínimos Quadrados , Fatores de RiscoRESUMO
Mendelian randomization (MR) is an epidemiological technique that uses genetic variants to distinguish correlation from causation in observational data. The reliability of a MR investigation depends on the validity of the genetic variants as instrumental variables (IVs). We develop the contamination mixture method, a method for MR with two modalities. First, it identifies groups of genetic variants with similar causal estimates, which may represent distinct mechanisms by which the risk factor influences the outcome. Second, it performs MR robustly and efficiently in the presence of invalid IVs. Compared to other robust methods, it has the lowest mean squared error across a range of realistic scenarios. The method identifies 11 variants associated with increased high-density lipoprotein-cholesterol, decreased triglyceride levels, and decreased coronary heart disease risk that have the same directions of associations with various blood cell traits, suggesting a shared mechanism linking lipids and coronary heart disease risk mediated via platelet aggregation.
Assuntos
Variação Genética , Análise da Randomização Mendeliana/métodos , Projetos de Pesquisa , Índice de Massa Corporal , HDL-Colesterol/sangue , HDL-Colesterol/genética , LDL-Colesterol/sangue , LDL-Colesterol/genética , Doença das Coronárias/sangue , Doença das Coronárias/epidemiologia , Doença das Coronárias/genética , Diabetes Mellitus Tipo 2/genética , Estudos de Associação Genética , Pleiotropia Genética , Predisposição Genética para Doença/genética , Humanos , Modelos Genéticos , Epidemiologia Molecular/métodos , Fenótipo , Reprodutibilidade dos Testes , Fatores de Risco , Triglicerídeos/sangue , Triglicerídeos/genéticaRESUMO
The MendelianRandomization package is a software package written for the R software environment that implements methods for Mendelian randomization based on summarized data. In this manuscript, we describe functions that have been added to the package or updated in recent years. These features can be divided into four categories: robust methods for Mendelian randomization, methods for multivariable Mendelian randomization, functions for data visualization, and the ability to load data into the package seamlessly from the PhenoScanner web-resource. We provide examples of the graphical output produced by the data visualization commands, as well as syntax for obtaining suitable data and performing a Mendelian randomization analysis in a single line of code.
RESUMO
Genetic studies of blood pressure (BP) to date have mainly analyzed common variants (minor allele frequency > 0.05). In a meta-analysis of up to ~1.3 million participants, we discovered 106 new BP-associated genomic regions and 87 rare (minor allele frequency ≤ 0.01) variant BP associations (P < 5 × 10-8), of which 32 were in new BP-associated loci and 55 were independent BP-associated single-nucleotide variants within known BP-associated regions. Average effects of rare variants (44% coding) were ~8 times larger than common variant effects and indicate potential candidate causal genes at new and known loci (for example, GATA5 and PLCB3). BP-associated variants (including rare and common) were enriched in regions of active chromatin in fetal tissues, potentially linking fetal development with BP regulation in later life. Multivariable Mendelian randomization suggested possible inverse effects of elevated systolic and diastolic BP on large artery stroke. Our study demonstrates the utility of rare-variant analyses for identifying candidate genes and the results highlight potential therapeutic targets.
Assuntos
Pressão Sanguínea/genética , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Hipertensão/genética , Fator de Transcrição GATA5/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Mutação/genética , Fosfolipase C beta/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: Evidence from randomized trials has shown that therapies that lower LDL (low-density lipoprotein)-cholesterol and triglycerides reduce coronary artery disease (CAD) risk. However, there is still uncertainty about their effects on other cardiovascular outcomes. We therefore performed a systematic investigation of causal relationships between circulating lipids and cardiovascular outcomes using a Mendelian randomization approach. METHODS: In the primary analysis, we performed 2-sample multivariable Mendelian randomization using data from participants of European ancestry. We also conducted univariable analyses using inverse-variance weighted and robust methods, and gene-specific analyses using variants that can be considered as proxies for specific lipid-lowering medications. We obtained associations with lipid fractions from the Global Lipids Genetics Consortium, a meta-analysis of 188 577 participants, and genetic associations with cardiovascular outcomes from 367 703 participants in UK Biobank. RESULTS: For LDL-cholesterol, in addition to the expected positive associations with CAD risk (odds ratio [OR] per 1 SD increase, 1.45 [95% CI, 1.35-1.57]) and other atheromatous outcomes (ischemic cerebrovascular disease and peripheral vascular disease), we found independent associations of genetically predicted LDL-cholesterol with abdominal aortic aneurysm (OR, 1.75 [95% CI, 1.40-2.17]) and aortic valve stenosis (OR, 1.46 [95% CI, 1.25-1.70]). Genetically predicted triglyceride levels were positively associated with CAD (OR, 1.25 [95% CI, 1.12-1.40]), aortic valve stenosis (OR, 1.29 [95% CI, 1.04-1.61]), and hypertension (OR, 1.17 [95% CI, 1.07-1.27]), but inversely associated with venous thromboembolism (OR, 0.79 [95% CI, 0.67-0.93]) and hemorrhagic stroke (OR, 0.78 [95% CI, 0.62-0.98]). We also found positive associations of genetically predicted LDL-cholesterol and triglycerides with heart failure that appeared to be mediated by CAD. CONCLUSIONS: Lowering LDL-cholesterol is likely to prevent abdominal aortic aneurysm and aortic stenosis, in addition to CAD and other atheromatous cardiovascular outcomes. Lowering triglycerides is likely to prevent CAD and aortic valve stenosis but may increase thromboembolic risk.
Assuntos
Doenças Cardiovasculares/genética , LDL-Colesterol/sangue , Adulto , Idoso , Doenças Cardiovasculares/sangue , Doenças Cardiovasculares/epidemiologia , Europa (Continente)/epidemiologia , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Razão de Chances , Fatores de Risco , Triglicerídeos/sangue , População Branca/genéticaRESUMO
Background: A robust method for Mendelian randomization does not require all genetic variants to be valid instruments to give consistent estimates of a causal parameter. Several such methods have been developed, including a mode-based estimation method giving consistent estimates if a plurality of genetic variants are valid instruments; i.e. there is no larger subset of invalid instruments estimating the same causal parameter than the subset of valid instruments. Methods: We here develop a model-averaging method that gives consistent estimates under the same 'plurality of valid instruments' assumption. The method considers a mixture distribution of estimates derived from each subset of genetic variants. The estimates are weighted such that subsets with more genetic variants receive more weight, unless variants in the subset have heterogeneous causal estimates, in which case that subset is severely down-weighted. The mode of this mixture distribution is the causal estimate. This heterogeneity-penalized model-averaging method has several technical advantages over the previously proposed mode-based estimation method. Results: The heterogeneity-penalized model-averaging method outperformed the mode-based estimation in terms of efficiency and outperformed other robust methods in terms of Type 1 error rate in an extensive simulation analysis. The proposed method suggests two distinct mechanisms by which inflammation affects coronary heart disease risk, with subsets of variants suggesting both positive and negative causal effects. Conclusions: The heterogeneity-penalized model-averaging method is an additional robust method for Mendelian randomization with excellent theoretical and practical properties, and can reveal features in the data such as the presence of multiple causal mechanisms.