Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 109(3): 393-404, 2022 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-35108496

RESUMEN

Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWASs) detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by predicted expression. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; a gene set is enriched for heritability if genes with high co-regulation to the set have higher TWAS chi-square statistics than genes with low co-regulation to the set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well calibrated and well powered. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched sets, recapitulating known biology. For Alzheimer disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify enriched gene sets.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Humanos , Herencia Multifactorial , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Transcriptoma
2.
Am J Hum Genet ; 109(4): 692-709, 2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35271803

RESUMEN

Recent works have shown that SNP heritability-which is dominated by low-effect common variants-may not be the most relevant quantity for localizing high-effect/critical disease genes. Here, we introduce methods to estimate the proportion of phenotypic variance explained by a given assignment of SNPs to a single gene ("gene-level heritability"). We partition gene-level heritability by minor allele frequency (MAF) to find genes whose gene-level heritability is explained exclusively by "low-frequency/rare" variants (0.5% ≤ MAF < 1%). Applying our method to ∼16K protein-coding genes and 25 quantitative traits in the UK Biobank (N = 290K "White British"), we find that, on average across traits, ∼2.5% of nonzero-heritability genes have a rare-variant component and only ∼0.8% (327 gene-trait pairs) have heritability exclusively from rare variants. Of these 327 gene-trait pairs, 114 (35%) were not detected by existing gene-level association testing methods. The additional genes we identify are significantly enriched for known disease genes, and we find several examples of genes that have been previously implicated in phenotypically related Mendelian disorders. Notably, the rare-variant component of gene-level heritability exhibits trends different from those of common-variant gene-level heritability. For example, while total gene-level heritability increases with gene length, the rare-variant component is significantly larger among shorter genes; the cumulative distributions of gene-level heritability also vary across traits and reveal differences in the relative contributions of rare/common variants to overall gene-level polygenicity. While nonzero gene-level heritability does not imply causality, if interpreted in the correct context, gene-level heritability can reveal useful insights into complex-trait genetic architecture.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética
3.
Am J Hum Genet ; 108(1): 36-48, 2021 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-33352115

RESUMEN

Identifying and interpreting pleiotropic loci is essential to understanding the shared etiology among diseases and complex traits. A common approach to mapping pleiotropic loci is to meta-analyze GWAS summary statistics across multiple traits. However, this strategy does not account for the complex genetic architectures of traits, such as genetic correlations and heritabilities. Furthermore, the interpretation is challenging because phenotypes often have different characteristics and units. We propose PLEIO (Pleiotropic Locus Exploration and Interpretation using Optimal test), a summary-statistic-based framework to map and interpret pleiotropic loci in a joint analysis of multiple diseases and complex traits. Our method maximizes power by systematically accounting for genetic correlations and heritabilities of the traits in the association test. Any set of related phenotypes, binary or quantitative traits with different units, can be combined seamlessly. In addition, our framework offers interpretation and visualization tools to help downstream analyses. Using our method, we combined 18 traits related to cardiovascular disease and identified 13 pleiotropic loci, which showed four different patterns of associations.


Asunto(s)
Pleiotropía Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
4.
Am J Hum Genet ; 106(6): 805-817, 2020 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-32442408

RESUMEN

Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8× enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.


Asunto(s)
Etnicidad/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Teorema de Bayes , Europa (Continente)/etnología , Asia Oriental/etnología , Frecuencia de los Genes , Humanos , Desequilibrio de Ligamiento , Especificidad de Órganos/genética
5.
Am J Hum Genet ; 103(4): 535-552, 2018 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-30290150

RESUMEN

Although recent studies provide evidence for a common genetic basis between complex traits and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific manner remains elusive. Here, we have quantified the overlap of genes identified through large-scale genome-wide association studies (GWASs) for 62 complex traits and diseases with genes containing mutations known to cause 20 broad categories of Mendelian disorders. We identified a significant enrichment of genes linked to phenotypically matched Mendelian disorders in GWAS gene sets; of the total 1,240 comparisons, a higher proportion of phenotypically matched or related pairs (n = 50 of 92 [54%]) than phenotypically unmatched pairs (n = 27 of 1,148 [2%]) demonstrated significant overlap, confirming a phenotype-specific enrichment pattern. Further, we observed elevated GWAS effect sizes near genes linked to phenotypically matched Mendelian disorders. Finally, we report examples of GWAS variants localized at the transcription start site or physically interacting with the promoters of genes linked to phenotypically matched Mendelian disorders. Our results are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are dysregulated by non-coding variants in complex traits and demonstrate how leveraging findings from related Mendelian disorders and functional genomic datasets can prioritize genes that are putatively dysregulated by local and distal non-coding GWAS variants.


Asunto(s)
Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Femenino , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino , Fenotipo , Regiones Promotoras Genéticas/genética , Sitio de Iniciación de la Transcripción/fisiología
6.
Eur Respir J ; 58(4)2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-33766948

RESUMEN

BACKGROUND: Lung function is a heritable complex phenotype with obesity being one of its important risk factors. However, knowledge of their shared genetic basis is limited. Most genome-wide association studies (GWASs) for lung function have been based on European populations, limiting the generalisability across populations. Large-scale lung function GWASs in other populations are lacking. METHODS: We included 100 285 subjects from the China Kadoorie Biobank (CKB). To identify novel loci for lung function, single-trait GWAS analyses were performed on forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC in the CKB. We then performed genome-wide cross-trait analysis between lung function and obesity traits (body mass index (BMI), BMI-adjusted waist-to-hip ratio and BMI-adjusted waist circumference) to investigate the shared genetic effects in the CKB. Finally, polygenic risk scores (PRSs) of lung function were developed in the CKB and their interaction with BMI's association on lung function were examined. We also conducted cross-trait analysis in parallel with the CKB using up to 457 756 subjects from the UK Biobank (UKB) for replication and investigation of ancestry-specific effects. RESULTS: We identified nine genome-wide significant novel loci for FEV1, six for FVC and three for FEV1/FVC in the CKB. FEV1 and FVC showed significant negative genetic correlation with obesity traits in both the CKB and UKB. Genetic loci shared between lung function and obesity traits highlighted important biological pathways, including cell proliferation, embryo, skeletal and tissue development, and regulation of gene expression. Mendelian randomisation analysis suggested significant negative causal effects of BMI on FEV1 and on FVC in both the CKB and UKB. Lung function PRSs significantly modified the effect of change in BMI on change in lung function during an average follow-up of 8 years. CONCLUSION: This large-scale GWAS of lung function identified novel loci and shared genetic aetiology between lung function and obesity. Change in BMI might affect change in lung function differently according to a subject's polygenic background. These findings may open new avenues for the development of molecular-targeted therapies for obesity and lung function improvement.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Índice de Masa Corporal , China , Volumen Espiratorio Forzado , Humanos , Pulmón , Obesidad/genética
7.
J Allergy Clin Immunol ; 145(2): 537-549, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31669095

RESUMEN

BACKGROUND: Clinical and epidemiologic studies have shown that obesity is associated with asthma and that these associations differ by asthma subtype. Little is known about the shared genetic components between obesity and asthma. OBJECTIVE: We sought to identify shared genetic associations between obesity-related traits and asthma subtypes in adults. METHODS: A cross-trait genome-wide association study (GWAS) was performed using 457,822 subjects of European ancestry from the UK Biobank. Experimental evidence to support the role of genes significantly associated with both obesity-related traits and asthma through a GWAS was sought by using results from obese versus lean mouse RNA sequencing and RT-PCR experiments. RESULTS: We found a substantial positive genetic correlation between body mass index and later-onset asthma defined by asthma age of onset at 16 years or greater (Rg = 0.25, P = 9.56 × 10-22). Mendelian randomization analysis provided strong evidence in support of body mass index causally increasing asthma risk. Cross-trait meta-analysis identified 34 shared loci among 3 obesity-related traits and 2 asthma subtypes. GWAS functional analyses identified potential causal relationships between the shared loci and Genotype-Tissue Expression (GTEx) quantitative trait loci and shared immune- and cell differentiation-related pathways between obesity and asthma. Finally, RNA sequencing data from lungs of obese versus control mice found that 2 genes (acyl-coenzyme A oxidase-like [ACOXL] and myosin light chain 6 [MYL6]) from the cross-trait meta-analysis were differentially expressed, and these findings were validated by using RT-PCR in an independent set of mice. CONCLUSIONS: Our work identified shared genetic components between obesity-related traits and specific asthma subtypes, reinforcing the hypothesis that obesity causally increases the risk of asthma and identifying molecular pathways that might underlie both obesity and asthma.


Asunto(s)
Asma/genética , Predisposición Genética a la Enfermedad/genética , Obesidad/genética , Adulto , Animales , Bancos de Muestras Biológicas , Índice de Masa Corporal , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Reino Unido
8.
Am J Hum Genet ; 101(5): 737-751, 2017 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-29100087

RESUMEN

Although genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions disproportionately contribute to the genome-wide correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach requires GWAS summary data only and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 36 quantitative traits, and identified 25 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 6 genomic regions that contribute to the genetic correlation of 10 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we report the distribution of local genetic correlations across the genome for 55 pairs of traits that show putative causal relationships.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Desequilibrio de Ligamiento/genética , Modelos Genéticos , Fenotipo
9.
Am J Hum Genet ; 100(3): 473-487, 2017 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-28238358

RESUMEN

Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases.


Asunto(s)
Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Fenotipo , Transcriptoma , Índice de Masa Corporal , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Lipoproteínas LDL/sangre , Modelos Teóricos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARN , Triglicéridos/sangre
10.
Bioinformatics ; 35(22): 4837-4839, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31173064

RESUMEN

MOTIVATION: Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. RESULTS: We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. AVAILABILITY AND IMPLEMENTATION: The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple
11.
Am J Hum Genet ; 99(1): 139-53, 2016 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-27346688

RESUMEN

Variance-component methods that estimate the aggregate contribution of large sets of variants to the heritability of complex traits have yielded important insights into the genetic architecture of common diseases. Here, we introduce methods that estimate the total trait variance explained by the typed variants at a single locus in the genome (local SNP heritability) from genome-wide association study (GWAS) summary data while accounting for linkage disequilibrium among variants. We applied our estimator to ultra-large-scale GWAS summary data of 30 common traits and diseases to gain insights into their local genetic architecture. First, we found that common SNPs have a high contribution to the heritability of all studied traits. Second, we identified traits for which the majority of the SNP heritability can be confined to a small percentage of the genome. Third, we identified GWAS risk loci where the entire locus explains significantly more variance in the trait than the GWAS reported variants. Finally, we identified loci that explain a significant amount of heritability across multiple traits.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Fenotipo , Conjuntos de Datos como Asunto , Predisposición Genética a la Enfermedad/genética , Humanos , Desequilibrio de Ligamiento/genética , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
12.
Eur Respir J ; 54(6)2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31619474

RESUMEN

Epidemiological studies demonstrate an association between asthma and mental health disorders, although little is known about the shared genetics and causality of this association. Thus, we aimed to investigate shared genetics and the causal link between asthma and mental health disorders.We conducted a large-scale genome-wide cross-trait association study to investigate genetic overlap between asthma from the UK Biobank and eight mental health disorders from the Psychiatric Genomics Consortium: attention deficit hyperactivity disorder (ADHD), anxiety disorder (ANX), autism spectrum disorder, bipolar disorder, eating disorder, major depressive disorder (MDD), post-traumatic stress disorder and schizophrenia (sample size 9537-394 283).In the single-trait genome-wide association analysis, we replicated 130 previously reported loci and discovered 31 novel independent loci that are associated with asthma. We identified that ADHD, ANX and MDD have a strong genetic correlation with asthma at the genome-wide level. Cross-trait meta-analysis identified seven loci jointly associated with asthma and ADHD, one locus with asthma and ANX, and 10 loci with asthma and MDD. Functional analysis revealed that the identified variants regulated gene expression in major tissues belonging to the exocrine/endocrine, digestive, respiratory and haemic/immune systems. Mendelian randomisation analyses suggested that ADHD and MDD (including 6.7% sample overlap with asthma) might increase the risk of asthma.This large-scale genome-wide cross-trait analysis identified shared genetics and potential causal links between asthma and three mental health disorders (ADHD, ANX and MDD). Such shared genetics implicate potential new biological functions that are in common among them.


Asunto(s)
Trastornos de Ansiedad/genética , Asma/genética , Trastorno por Déficit de Atención con Hiperactividad/genética , Trastorno Depresivo/genética , Adulto , Niño , Estudio de Asociación del Genoma Completo , Humanos , Reino Unido
13.
Bioinformatics ; 34(13): i195-i201, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949958

RESUMEN

Motivation: A large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases. Results: In this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis-Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and body mass index GWAS to show that it produces estimates consistent with the known genetic makeup of both traits. Availability and implementation: The UNITY software is made freely available to the research community at: https://github.com/bogdanlab/UNITY. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Teorema de Bayes , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos
14.
Bioinformatics ; 34(15): 2538-2545, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29579179

RESUMEN

Motivation: Most genetic variants implicated in complex diseases by genome-wide association studies (GWAS) are non-coding, making it challenging to understand the causative genes involved in disease. Integrating external information such as quantitative trait locus (QTL) mapping of molecular traits (e.g. expression, methylation) is a powerful approach to identify the subset of GWAS signals explained by regulatory effects. In particular, expression QTLs (eQTLs) help pinpoint the responsible gene among the GWAS regions that harbor many genes, while methylation QTLs (mQTLs) help identify the epigenetic mechanisms that impact gene expression which in turn affect disease risk. In this work, we propose multiple-trait-coloc (moloc), a Bayesian statistical framework that integrates GWAS summary data with multiple molecular QTL data to identify regulatory effects at GWAS risk loci. Results: We applied moloc to schizophrenia (SCZ) and eQTL/mQTL data derived from human brain tissue and identified 52 candidate genes that influence SCZ through methylation. Our method can be applied to any GWAS and relevant functional data to help prioritize disease associated genes. Availability and implementation: moloc is available for download as an R package (https://github.com/clagiamba/moloc). We also developed a web site to visualize the biological findings (icahn.mssm.edu/moloc). The browser allows searches by gene, methylation probe and scenario of interest. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo Cromosómico/métodos , Epigénesis Genética , Genómica/métodos , Sitios de Carácter Cuantitativo , Programas Informáticos , Transcriptoma , Teorema de Bayes , Encéfalo/metabolismo , Metilación de ADN , Epigenómica/métodos , Perfilación de la Expresión Génica/métodos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Esquizofrenia/genética
15.
Bioinformatics ; 31(21): 3514-21, 2015 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-26139633

RESUMEN

MOTIVATION: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: shihuwenbo@ucla.edu or pasaniuc@ucla.edu.


Asunto(s)
Desoxirribonucleasa I , Haplotipos , Modelos Estadísticos , Algoritmos , Línea Celular , Humanos , Modelos Lineales , Modelos Logísticos , Análisis Multivariante , Programas Informáticos
16.
Bioinformatics ; 30(20): 2906-14, 2014 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-24990607

RESUMEN

MOTIVATION: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Asunto(s)
Bioestadística/métodos , Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Estudios de Casos y Controles , Estudios de Cohortes , Genotipo , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Factores de Tiempo
17.
medRxiv ; 2024 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-38798542

RESUMEN

Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum of single effects model (SuSiE) to multiple ancestries that allows causal effect sizes to vary across ancestries based on a multivariate normal prior informed by empirical data. We evaluated MultiSuSiE via simulations and analyses of 14 quantitative traits leveraging whole-genome sequencing data in 47k African-ancestry and 94k European-ancestry individuals from All of Us. In simulations, MultiSuSiE applied to Afr47k+Eur47k was well-calibrated and attained higher power than SuSiE applied to Eur94k; interestingly, higher causal variant PIPs in Afr47k compared to Eur47k were entirely explained by differences in the extent of LD quantified by LD 4th moments. Compared to very recently proposed multi-ancestry fine-mapping methods, MultiSuSiE attained higher power and/or much lower computational costs, making the analysis of large-scale All of Us data feasible. In real trait analyses, MultiSuSiE applied to Afr47k+Eur94k identified 579 fine-mapped variants with PIP > 0.5, and MultiSuSiE applied to Afr47k+Eur47k identified 44% more fine-mapped variants with PIP > 0.5 than SuSiE applied to Eur94k. We validated MultiSuSiE results for real traits via functional enrichment of fine-mapped variants. We highlight several examples where MultiSuSiE implicates well-studied or biologically plausible fine-mapped variants that were not implicated by other methods.

18.
Res Sq ; 2023 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-38168385

RESUMEN

The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.

19.
medRxiv ; 2023 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-38106023

RESUMEN

The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.

20.
Elife ; 112022 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-36515579

RESUMEN

The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene-trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this 'missing regulation.'


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Herencia Multifactorial/genética , Epigenómica , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA