Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters











Publication year range
1.
Cell Genom ; 2(7)2022 Jul 13.
Article in English | MEDLINE | ID: mdl-35873673

ABSTRACT

We assess contributions to autoimmune disease of genes whose regulation is driven by enhancer regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using several SNP-to-gene (S2G) strategies and apply heritability analyses to draw three conclusions about 11 autoimmune/blood-related diseases/traits. First, several characterizations of enhancer-related genes using functional genomics data are informative for autoimmune disease heritability after conditioning on a broad set of regulatory annotations. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2-fold stronger heritability signal and >2-fold stronger enrichment for drug targets, compared with the recently proposed enhancer domain score. In each case, functionally informed S2G strategies produced 4.1- to 13-fold stronger disease signals than conventional window-based strategies.

2.
Nat Genet ; 52(12): 1355-1363, 2020 12.
Article in English | MEDLINE | ID: mdl-33199916

ABSTRACT

Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome-not just genome-wide-significant loci-to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In simulations, PolyFun + SuSiE and PolyFun + FINEMAP were well calibrated and identified >20% more variants with a posterior causal probability >0.95 than identified in their nonfunctionally informed counterparts. In analyses of 49 UK Biobank traits (average n = 318,000), PolyFun + SuSiE identified 3,025 fine-mapped variant-trait pairs with posterior causal probability >0.95, a >32% improvement versus SuSiE. We used posterior mean per-SNP heritabilities from PolyFun + SuSiE to perform polygenic localization, constructing minimal sets of common SNPs causally explaining 50% of common SNP heritability; these sets ranged in size from 28 (hair color) to 3,400 (height) to 2 million (number of children). In conclusion, PolyFun prioritizes variants for functional follow-up and provides insights into complex trait architectures.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Genome, Human/genetics , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics
3.
Nat Commun ; 11(1): 4703, 2020 09 17.
Article in English | MEDLINE | ID: mdl-32943643

ABSTRACT

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.


Subject(s)
Deep Learning , Disease/genetics , Molecular Sequence Annotation , Alleles , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Histones/genetics , Humans , Linkage Disequilibrium , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide
4.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Article in English | MEDLINE | ID: mdl-31595288

ABSTRACT

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Subject(s)
Chromatin/genetics , Genetic Diseases, Inborn/genetics , Molecular Sequence Annotation , Transcription Factors/genetics , Binding Sites/genetics , Computational Biology , Gene Expression Regulation/genetics , Genetic Diseases, Inborn/classification , Genetic Diseases, Inborn/pathology , Humans , Linkage Disequilibrium/genetics , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , Protein Binding/genetics
6.
Nat Commun ; 10(1): 4054, 2019 09 06.
Article in English | MEDLINE | ID: mdl-31492842

ABSTRACT

Transposable elements (TE) comprise roughly half of the human genome. Though initially derided as junk DNA, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases remains unknown. Here, we analyze data from 41 independent diseases and complex traits to draw three conclusions. First, TE are uniquely informative for disease heritability. Despite overall depletion for heritability (54% of SNPs, 39 ± 2% of heritability), TE explain substantially more heritability than expected based on their depletion for known functional annotations. This implies that TE acquire function in ways that differ from known functional annotations. Second, older TE contribute more to disease heritability, consistent with acquiring biological function. Third, Short Interspersed Nuclear Elements (SINE) are far more enriched for blood traits than for other traits. Our results can help elucidate the biological roles that TE play in the genetic architecture of diseases.


Subject(s)
DNA Transposable Elements/genetics , Disease/genetics , Gene Expression Regulation , Genome, Human/genetics , Inheritance Patterns/genetics , Retroelements/genetics , Algorithms , Autoimmune Diseases/blood , Autoimmune Diseases/genetics , Brain Diseases/blood , Brain Diseases/genetics , Evolution, Molecular , Humans , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Short Interspersed Nucleotide Elements/genetics
7.
Am J Hum Genet ; 104(5): 896-913, 2019 05 02.
Article in English | MEDLINE | ID: mdl-31051114

ABSTRACT

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks , Genes/genetics , Genetic Diseases, Inborn/genetics , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Humans , Molecular Sequence Annotation , Phenotype , Software
8.
Am J Hum Genet ; 104(5): 879-895, 2019 05 02.
Article in English | MEDLINE | ID: mdl-31006511

ABSTRACT

Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures where specific transcription factors (TFs) are bound. To link these two features, we introduce IMPACT, a genome annotation strategy that identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT distinguishes between bound and unbound TF motif sites with high accuracy (average AUPRC 0.81, SE 0.07; across 8 tested TFs) and outperforms state-of-the-art TF binding prediction methods, MocapG, MocapS, and Virtual ChIP-seq. Second, in eight tested cell types, RNA polymerase II IMPACT annotations capture more cis-eQTL variation than sequence-based annotations, such as promoters and TSS windows (25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N = 38,242) and East Asian (N = 22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% of RA h2, the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Lastly, we find that IMPACT may be used in many different cell types to identify complex trait associated regulatory elements.


Subject(s)
Arthritis, Rheumatoid/metabolism , Epigenome , Epigenomics/methods , Genome, Human , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Arthritis, Rheumatoid/genetics , Chromatin/genetics , Chromatin/metabolism , Computational Biology/methods , Histones/genetics , Histones/metabolism , Humans , Promoter Regions, Genetic , Protein Binding , Transcription Factors/genetics
9.
Am J Hum Genet ; 104(4): 611-624, 2019 04 04.
Article in English | MEDLINE | ID: mdl-30905396

ABSTRACT

Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritability enrichment by applying stratified LD score regression to summary statistics from 41 independent diseases and complex traits (average N = 320K) and meta-analyzing results across traits. Enrichment of human putative enhancers and promoters was larger in elements with older sequence age, assessed via alignment with other species irrespective of conserved functionality: putative enhancer elements with ancient sequence age (older than the split between marsupial and placental mammals) were 8.8× enriched (versus 2.5× for all putative enhancers; p = 3e-14), and promoter elements with ancient sequence age were 13.5× enriched (versus 5.1× for all promoters; p = 5e-16). Enrichment of human putative enhancers and promoters was also larger in elements whose regulatory function was conserved across species, e.g., human putative enhancers that were enhancers in ≥5 of 9 other mammals were 4.6× enriched (p = 5e-12 versus all putative enhancers). Enrichment of human promoters was larger in promoters of loss-of-function intolerant genes: 12.0× enrichment (p = 8e-15 versus all promoters). The mean value of several measures of negative selection within these genomic annotations mirrored all of these findings. Notably, the annotations with these excess heritability enrichments were jointly significant conditional on each other and on our baseline-LD model, which includes a broad set of coding, conserved, regulatory, and LD-related annotations.


Subject(s)
Enhancer Elements, Genetic , Genetic Diseases, Inborn/genetics , Promoter Regions, Genetic , Animals , Conserved Sequence , Genome-Wide Association Study , Genomics , Humans , Linkage Disequilibrium , Mammals/genetics , Marsupialia/genetics , Phenotype , Polymorphism, Single Nucleotide , Species Specificity
10.
Nat Genet ; 50(10): 1483-1493, 2018 10.
Article in English | MEDLINE | ID: mdl-30177862

ABSTRACT

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.


Subject(s)
Disease/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Quantitative Trait Loci , Transcription Factors/metabolism , Binding Sites/genetics , Blood Cells/metabolism , Blood Cells/pathology , Blood Chemical Analysis , Gene Expression Regulation , Genetic Predisposition to Disease , Humans , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide , Protein Binding , Risk Factors
11.
Nat Genet ; 50(7): 1041-1047, 2018 07.
Article in English | MEDLINE | ID: mdl-29942083

ABSTRACT

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10-31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10-35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.


Subject(s)
Disease/genetics , Multifactorial Inheritance , Quantitative Trait Loci , Genome-Wide Association Study/methods , Humans , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable
12.
Science ; 352(6285): 600-4, 2016 Apr 29.
Article in English | MEDLINE | ID: mdl-27126046

ABSTRACT

Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.


Subject(s)
Gene Expression Regulation , Genetic Variation , Immune System Diseases/genetics , Quantitative Trait Loci , RNA Splicing/genetics , Cell Line , Chromatin/metabolism , Genome-Wide Association Study , Humans , Lymphocytes/immunology , Phenotype , Polymorphism, Single Nucleotide
13.
Nat Methods ; 12(11): 1061-3, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26366987

ABSTRACT

Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.


Subject(s)
Computational Biology/methods , Quantitative Trait Loci , Sequence Analysis, RNA/methods , Alleles , Artifacts , Chromatin Immunoprecipitation , Genome , Genotype , Haplotypes , Heterozygote , Humans , Likelihood Functions , Reproducibility of Results , Sequence Analysis, DNA , Software
14.
Mol Ecol ; 24(17): 4392-405, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26198179

ABSTRACT

Lemurs, the living primates most distantly related to humans, demonstrate incredible diversity in behaviour, life history patterns and adaptive traits. Although many lemur species are endangered within their native Madagascar, there is no high-quality genome assembly from this taxon, limiting population and conservation genetic studies. One critically endangered lemur is the blue-eyed black lemur Eulemur flavifrons. This species is fixed for blue irises, a convergent trait that evolved at least four times in primates and was subject to positive selection in humans, where 5' regulatory variation of OCA2 explains most of the brown/blue eye colour differences. We built a de novo genome assembly for E. flavifrons, providing the most complete lemur genome to date, and a high confidence consensus sequence for close sister species E. macaco, the (brown-eyed) black lemur. From diversity and divergence patterns across the genomes, we estimated a recent split time of the two species (160 Kya) and temporal fluctuations in effective population sizes that accord with known environmental changes. By looking for regions of unusually low diversity, we identified potential signals of directional selection in E. flavifrons at MITF, a melanocyte development gene that regulates OCA2 and has previously been associated with variation in human iris colour, as well as at several other genes involved in melanin biosynthesis in mammals. Our study thus illustrates how whole-genome sequencing of a few individuals can illuminate the demographic and selection history of nonmodel species.


Subject(s)
Biological Evolution , Eye Color/genetics , Lemur/genetics , Membrane Transport Proteins/genetics , Animals , Genetics, Population , Genome , Madagascar , Pigmentation/genetics , Population Density , Selection, Genetic
15.
PLoS Genet ; 10(9): e1004663, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25233095

ABSTRACT

DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at ∼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.


Subject(s)
DNA Methylation , Gene Expression Regulation , Histones/metabolism , Quantitative Trait Loci , Transcription Factors/metabolism , Binding Sites , Cell Line, Transformed , Computational Biology , Genome-Wide Association Study , Genomics/methods , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide , Protein Binding
16.
Science ; 342(6159): 747-9, 2013 Nov 08.
Article in English | MEDLINE | ID: mdl-24136359

ABSTRACT

Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines (LCLs). In many cases, the same variant is associated with quantitative changes in multiple histone marks and Pol II, as well as in deoxyribonuclease I sensitivity and nucleosome positioning. Transcription factor binding site polymorphisms are correlated overall with differences in local histone modification, and we identify specific transcription factors whose binding leads to histone modification in LCLs. Furthermore, variants that affect chromatin at distal regulatory sites frequently also direct changes in chromatin and gene expression at associated promoters.


Subject(s)
Gene Expression Regulation , Genetic Variation , Histones/metabolism , Protein Processing, Post-Translational/genetics , RNA Polymerase II/metabolism , Transcription Factors/metabolism , Binding Sites/genetics , Cell Line, Tumor , Cells/metabolism , Chromatin/chemistry , Chromatin/genetics , Chromatin/metabolism , Genome, Human , Histones/chemistry , Histones/genetics , Humans , Polymorphism, Genetic , Promoter Regions, Genetic , Quantitative Trait Loci , RNA Polymerase II/chemistry , Transcription Factors/genetics
17.
BMC Public Health ; 12: 449, 2012 Jun 18.
Article in English | MEDLINE | ID: mdl-22713694

ABSTRACT

BACKGROUND: Around the globe, school closures were used sporadically to mitigate the 2009 H1N1 influenza pandemic. However, such closures can detrimentally impact economic and social life. METHODS: Here, we couple a decision analytic approach with a mathematical model of influenza transmission to estimate the impact of school closures in terms of epidemiological and cost effectiveness. Our method assumes that the transmissibility and the severity of the disease are uncertain, and evaluates several closure and reopening strategies that cover a range of thresholds in school-aged prevalence (SAP) and closure durations. RESULTS: Assuming a willingness to pay per quality adjusted life-year (QALY) threshold equal to the US per capita GDP ($46,000), we found that the cost effectiveness of these strategies is highly dependent on the severity and on a willingness to pay per QALY. For severe pandemics, the preferred strategy couples the earliest closure trigger (0.5% SAP) with the longest duration closure (24 weeks) considered. For milder pandemics, the preferred strategies also involve the earliest closure trigger, but are shorter duration (12 weeks for low transmission rates and variable length for high transmission rates). CONCLUSIONS: These findings highlight the importance of obtaining early estimates of pandemic severity and provide guidance to public health decision-makers for effectively tailoring school closures strategies in response to a newly emergent influenza pandemic.


Subject(s)
Decision Support Techniques , Health Policy/economics , Influenza A Virus, H1N1 Subtype , Influenza, Human/epidemiology , Pandemics/prevention & control , Schools/organization & administration , Adolescent , Child , Child, Preschool , Computer Simulation , Cost-Benefit Analysis , Humans , Influenza, Human/economics , Models, Economic , Models, Theoretical , Pandemics/economics , Schools/economics , Texas/epidemiology , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL