RESUMO
Linking variants from genome-wide association studies (GWAS) to underlying mechanisms of disease remains a challenge1-3. For some diseases, a successful strategy has been to look for cases in which multiple GWAS loci contain genes that act in the same biological pathway1-6. However, our knowledge of which genes act in which pathways is incomplete, particularly for cell-type-specific pathways or understudied genes. Here we introduce a method to connect GWAS variants to functions. This method links variants to genes using epigenomics data, links genes to pathways de novo using Perturb-seq and integrates these data to identify convergence of GWAS loci onto pathways. We apply this approach to study the role of endothelial cells in genetic risk for coronary artery disease (CAD), and discover 43 CAD GWAS signals that converge on the cerebral cavernous malformation (CCM) signalling pathway. Two regulators of this pathway, CCM2 and TLNRD1, are each linked to a CAD risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. These results suggest a model whereby CAD risk is driven in part by the convergence of causal genes onto a particular transcriptional pathway in endothelial cells. They highlight shared genes between common and rare vascular diseases (CAD and CCM), and identify TLNRD1 as a new, previously uncharacterized member of the CCM signalling pathway. This approach will be widely useful for linking variants to functions for other common polygenic diseases.
Assuntos
Doença da Artéria Coronariana , Células Endoteliais , Estudo de Associação Genômica Ampla , Hemangioma Cavernoso do Sistema Nervoso Central , Humanos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/patologia , Células Endoteliais/metabolismo , Células Endoteliais/patologia , Predisposição Genética para Doença/genética , Hemangioma Cavernoso do Sistema Nervoso Central/genética , Hemangioma Cavernoso do Sistema Nervoso Central/patologia , Polimorfismo de Nucleotídeo Único , Epigenômica , Transdução de Sinais/genética , Herança MultifatorialRESUMO
Regulatory relationships between transcription factors (TFs) and their target genes lie at the heart of cellular identity and function; however, uncovering these relationships is often labor-intensive and requires perturbations. Here, we propose a principled framework to systematically infer gene regulation for all TFs simultaneously in cells at steady state by leveraging the intrinsic variation in the transcriptional abundance across single cells. Through modeling and simulations, we characterize how transcriptional bursts of a TF gene are propagated to its target genes, including the expected ranges of time delay and magnitude of maximum covariation. We distinguish these temporal trends from the time-invariant covariation arising from cell states, and we delineate the experimental and technical requirements for leveraging these small but meaningful cofluctuations in the presence of measurement noise. While current technology does not yet allow adequate power for definitively detecting regulatory relationships for all TFs simultaneously in cells at steady state, we investigate a small-scale dataset to inform future experimental design. This study supports the potential value of mapping regulatory connections through stochastic variation, and it motivates further technological development to achieve its full potential.
Assuntos
Regulação da Expressão Gênica , Modelos Biológicos , Fatores de Transcrição , Simulação por Computador , Redes Reguladoras de Genes , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn's disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Coortes , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Controle de QualidadeRESUMO
Genome-wide association studies have revealed that the genetic architecture of most complex traits is characterized by a large number of distinct effects scattered across the genome. Functional enrichment analyses of these results suggest that the associations for any given complex trait are not purely random. Thus, we set out to leverage the genetic association results from many traits with a view to identifying the set of modules, or latent factors, that mediate these associations. The identification of such modules may aid in disease classification as well as the elucidation of complex disease mechanisms. We propose a method, Genetic Unmixing by Independent Decomposition (GUIDE), to estimate a set of statistically independent latent factors that best express the patterns of association across many traits. The resulting latent factors not only have desirable mathematical properties, such as sparsity and a higher variance explained (for both traits and variants), but are also able to single out and prioritize key biological features or pathophysiological mechanisms underlying a given trait or disease. Moreover, we show that these latent factors can index biological pathways as well as epidemiological and environmental influences that compose the genetic architecture of complex traits.
RESUMO
The phenotypic impact of compound heterozygous (CH) variation has not been investigated at the population scale. We phased rare variants (MAF â¼0.001%) in the UK Biobank (UKBB) exome-sequencing data to characterize recessive effects in 175,587 individuals across 311 common diseases. A total of 6.5% of individuals carry putatively damaging CH variants, 90% of which are only identifiable upon phasing rare variants (MAF < 0.38%). We identify six recessive gene-trait associations (p < 1.68 × 10-7) after accounting for relatedness, polygenicity, nearby common variants, and rare variant burden. Of these, just one is discovered when considering homozygosity alone. Using longitudinal health records, we additionally identify and replicate a novel association between bi-allelic variation in ATP2C2 and an earlier age at onset of chronic obstructive pulmonary disease (COPD) (p < 3.58 × 10-8). Genetic phase contributes to disease risk for gene-trait pairs: ATP2C2-COPD (p = 0.000238), FLG-asthma (p = 0.00205), and USH2A-visual impairment (p = 0.0084). We demonstrate the power of phasing large-scale genetic cohorts to discover phenome-wide consequences of compound heterozygosity.
Assuntos
Bancos de Espécimes Biológicos , Exoma , Heterozigoto , Fenótipo , Humanos , Reino Unido/epidemiologia , Exoma/genética , Predisposição Genética para Doença , Doença Pulmonar Obstrutiva Crônica/genética , Feminino , Masculino , Proteínas Filagrinas , Estudo de Associação Genômica Ampla , Asma/genética , Biobanco do Reino UnidoRESUMO
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Assuntos
Desequilíbrio de Ligação , Humanos , Alelos , Frequência do Gene/genética , Estudos de Associação Genética , Haplótipos/genéticaRESUMO
Exome-sequencing association studies have successfully linked rare protein-coding variation to risk of thousands of diseases. However, the relationship between rare deleterious compound heterozygous (CH) variation and their phenotypic impact has not been fully investigated. Here, we leverage advances in statistical phasing to accurately phase rare variants (MAF ~ 0.001%) in exome sequencing data from 175,587 UK Biobank (UKBB) participants, which we then systematically annotate to identify putatively deleterious CH coding variation. We show that 6.5% of individuals carry such damaging variants in the CH state, with 90% of variants occurring at MAF < 0.34%. Using a logistic mixed model framework, systematically accounting for relatedness, polygenic risk, nearby common variants, and rare variant burden, we investigate recessive effects in common complex diseases. We find six exome-wide significant (P<1.68×10-7) and 17 nominally significant (P<5.25×10-5) gene-trait associations. Among these, only four would have been identified without accounting for CH variation in the gene. We further incorporate age-at-diagnosis information from primary care electronic health records, to show that genetic phase influences lifetime risk of disease across 20 gene-trait combinations (FDR < 5%). Using a permutation approach, we find evidence for genetic phase contributing to disease susceptibility for a collection of gene-trait pairs, including FLG-asthma (P=0.00205) and USH2A-visual impairment (P=0.0084). Taken together, we demonstrate the utility of phasing large-scale genetic sequencing cohorts for robust identification of the phenome-wide consequences of compound heterozygosity.
RESUMO
Classical statistical genetics theory defines dominance as any deviation from a purely additive, or dosage, effect of a genotype on a trait, which is known as the dominance deviation. Dominance is well documented in plant and animal breeding. Outside of rare monogenic traits, however, evidence in humans is limited. We systematically examined common genetic variation across 1060 traits in a large population cohort (UK Biobank, N = 361,194 samples analyzed) for evidence of dominance effects. We then developed a computationally efficient method to rapidly assess the aggregate contribution of dominance deviations to heritability. Lastly, observing that dominance associations are inherently less correlated between sites at a genomic locus than their additive counterparts, we explored whether they may be leveraged to identify causal variants more confidently.
Assuntos
Bancos de Espécimes Biológicos , Genes Dominantes , Variação Genética , Herança Multifatorial , Animais , Humanos , Cruzamento , Genótipo , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino UnidoRESUMO
Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Assuntos
Estatura , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial , Adaptação Biológica , Bioestatística , Bases de Dados Factuais , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único , Reino UnidoRESUMO
There are established associations between advanced paternal age and offspring risk for psychiatric and developmental disorders. These are commonly attributed to genetic mutations, especially de novo single nucleotide variants (dnSNVs), that accumulate with increasing paternal age. However, the actual magnitude of risk from such mutations in the male germline is unknown. Quantifying this risk would clarify the clinical significance of delayed paternity. Using parent-child trio whole-exome-sequencing data, we estimate the relationship between paternal-age-related dnSNVs and risk for five disorders: autism spectrum disorder (ASD), congenital heart disease, neurodevelopmental disorders with epilepsy, intellectual disability and schizophrenia (SCZ). Using Danish registry data, we investigate whether epidemiologic associations between each disorder and older fatherhood are consistent with the estimated role of dnSNVs. We find that paternal-age-related dnSNVs confer a small amount of risk for these disorders. For ASD and SCZ, epidemiologic associations with delayed paternity reflect factors that may not increase with age.
Assuntos
Testes Genéticos , Modelos Genéticos , Idade Paterna , Adulto , Fatores Etários , Transtorno do Espectro Autista/epidemiologia , Transtorno do Espectro Autista/genética , Criança , Dinamarca/epidemiologia , Epilepsia/epidemiologia , Epilepsia/genética , Feminino , Cardiopatias Congênitas/epidemiologia , Cardiopatias Congênitas/genética , Humanos , Incidência , Deficiência Intelectual/epidemiologia , Deficiência Intelectual/genética , Masculino , Pessoa de Meia-Idade , Mutação , Polimorfismo de Nucleotídeo Único , Prevalência , Sistema de Registros/estatística & dados numéricos , Medição de Risco/métodos , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Sequenciamento do ExomaRESUMO
Functional characterization of the noncoding genome is essential for biological understanding of gene regulation and disease. Here, we introduce the computational framework PINES (Phenotype-Informed Noncoding Element Scoring), which predicts the functional impact of noncoding variants by integrating epigenetic annotations in a phenotype-dependent manner. PINES enables analyses to be customized towards genomic annotations from cell types of the highest relevance given the phenotype of interest. We illustrate that PINES identifies functional noncoding variation more accurately than methods that do not use phenotype-weighted knowledge, while at the same time being flexible and easy to use via a dedicated web portal.
Assuntos
Algoritmos , DNA Intergênico/genética , Variação Genética , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Loci Gênicos , Predisposição Genética para Doença , Humanos , Doenças Inflamatórias Intestinais/genética , Anotação de Sequência Molecular , Doença de Parkinson/genética , Fenótipo , Fatores de RiscoRESUMO
Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.