ABSTRACT
Identifying the causes of similarities and differences in genetic disease prevalence among humans is central to understanding disease etiology. While present-day humans are not strongly differentiated, vast amounts of genomic data now make it possible to study subtle patterns of genetic variation. This allows us to trace our genomic history thousands of years into the past and its implications for the distribution of disease-associated variants today. Genomic analyses have shown that demographic processes shaped the distribution and frequency of disease-associated variants over time. Furthermore, local adaptation to new environmental conditions-including pathogens-has generated strong patterns of differentiation at particular loci. Researchers are also beginning to uncover the genetic architecture of complex diseases, affected by many variants of small effect. The field of population genomics thus holds great potential for providing further insights into the evolution of human disease.
Subject(s)
Genetic Diseases, Inborn/epidemiology , Genetic Diseases, Inborn/etiology , Metagenomics/methods , Adaptation, Physiological/genetics , Alleles , Evolution, Molecular , Gene Frequency/genetics , Genetic Drift , Genetic Variation/genetics , Genetics, Population/methods , Genomics/methods , Humans , Metagenomics/trends , Models, Genetic , PhylogenyABSTRACT
Large biobank samples provide an opportunity to integrate broad phenotyping, familial records, and molecular genetics data to study complex traits and diseases. We introduce Pearson-Aitken Family Genetic Risk Scores (PA-FGRS), a method for estimating disease liability from patterns of diagnoses in extended, age-censored genealogical records. We then apply the method to study a paradigmatic complex disorder, major depressive disorder (MDD), using the iPSYCH2015 case-cohort study of 30,949 MDD cases, 39,655 random population controls, and more than 2 million relatives. We show that combining PA-FGRS liabilities estimated from family records with molecular genotypes of probands improves three lines of inquiry. Incorporating PA-FGRS liabilities improves classification of MDD over and above polygenic scores, identifies robust genetic contributions to clinical heterogeneity in MDD associated with comorbidity, recurrence, and severity and can improve the power of genome-wide association studies. Our method is flexible and easy to use, and our study approaches are generalizable to other datasets and other complex traits and diseases.
ABSTRACT
For more than half a century, Denmark has maintained population-wide demographic, health care, and socioeconomic registers that provide detailed information on the interaction between all residents and the extensive national social services system. We leverage this resource to reconstruct the genealogy of the entire nation based on all individuals legally residing in Denmark since 1968. We cross-reference 6,691,426 individuals with nationwide health care registers to estimate heritability and genetic correlations of 10 broad diagnostic categories involving all major organs and systems. Heritability estimates for mental disorders were consistently the highest across demographic cohorts (average h2 = 0.406, 95% CI = [0.403, 0.408]), whereas estimates for cancers were the lowest (average h2 = 0.130, 95% CI = [0.125, 0.134]). The average genetic correlation of each of the 10 diagnostic categories with the other nine was highest for gastrointestinal conditions (average rg = 0.567, 95% CI = [0.566, 0.567]) and lowest for urogenital conditions (average rg = 0.386, 95% CI = [0.385, 0.388]). Mental, pulmonary, gastrointestinal, and neurological conditions had similar genetic correlation profiles.
Subject(s)
Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease/genetics , Denmark , Health Services Research/methods , Humans , Mental Disorders/diagnosis , Mental Disorders/geneticsABSTRACT
Febrile seizures represent the most common type of pathological brain activity in young children and are influenced by genetic, environmental and developmental factors. In a minority of cases, febrile seizures precede later development of epilepsy. We conducted a genome-wide association study of febrile seizures in 7635 cases and 83 966 controls identifying and replicating seven new loci, all with P < 5 × 10-10. Variants at two loci were functionally related to altered expression of the fever response genes PTGER3 and IL10, and four other loci harboured genes (BSN, ERC2, GABRG2, HERC1) influencing neuronal excitability by regulating neurotransmitter release and binding, vesicular transport or membrane trafficking at the synapse. Four previously reported loci (SCN1A, SCN2A, ANO3 and 12q21.33) were all confirmed. Collectively, the seven novel and four previously reported loci explained 2.8% of the variance in liability to febrile seizures, and the single nucleotide polymorphism heritability based on all common autosomal single nucleotide polymorphisms was 10.8%. GABRG2, SCN1A and SCN2A are well-established epilepsy genes and, overall, we found positive genetic correlations with epilepsies (rg = 0.39, P = 1.68 × 10-4). Further, we found that higher polygenic risk scores for febrile seizures were associated with epilepsy and with history of hospital admission for febrile seizures. Finally, we found that polygenic risk of febrile seizures was lower in febrile seizure patients with neuropsychiatric disease compared to febrile seizure patients in a general population sample. In conclusion, this largest genetic investigation of febrile seizures to date implicates central fever response genes as well as genes affecting neuronal excitability, including several known epilepsy genes. Further functional and genetic studies based on these findings will provide important insights into the complex pathophysiological processes of seizures with and without fever.
Subject(s)
Epilepsy , Seizures, Febrile , Anoctamins/genetics , Child , Child, Preschool , Epilepsy/genetics , Fever/complications , Fever/genetics , Genome-Wide Association Study , Humans , NAV1.1 Voltage-Gated Sodium Channel/genetics , Seizures, Febrile/geneticsABSTRACT
Circulating inflammatory markers are essential to human health and disease, and they are often dysregulated or malfunctioning in cancers as well as in cardiovascular, metabolic, immunologic and neuropsychiatric disorders. However, the genetic contribution to the physiological variation of levels of circulating inflammatory markers is largely unknown. Here we report the results of a genome-wide genetic study of blood concentration of ten cytokines, including the hitherto unexplored calcium-binding protein (S100B). The study leverages a unique sample of neonatal blood spots from 9,459 Danish subjects from the iPSYCH initiative. We estimate the SNP-heritability of marker levels as ranging from essentially zero for Erythropoietin (EPO) up to 73% for S100B. We identify and replicate 16 associated genomic regions (p < 5 x 10-9), of which four are novel. We show that the associated variants map to enhancer elements, suggesting a possible transcriptional effect of genomic variants on the cytokine levels. The identification of the genetic architecture underlying the basic levels of cytokines is likely to prompt studies investigating the relationship between cytokines and complex disease. Our results also suggest that the genetic architecture of cytokines is stable from neonatal to adult life.
Subject(s)
Cytokines/genetics , Inflammation/diagnosis , Quantitative Trait Loci , Biomarkers/blood , Cohort Studies , Cytokines/blood , Cytokines/immunology , Denmark , Enhancer Elements, Genetic/genetics , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Infant, Newborn , Inflammation/blood , Inflammation/immunology , Male , Polymorphism, Single Nucleotide , S100 Calcium Binding Protein beta Subunit/blood , S100 Calcium Binding Protein beta Subunit/genetics , S100 Calcium Binding Protein beta Subunit/immunologyABSTRACT
MOTIVATION: Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. RESULTS: Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. AVAILABILITY: The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Computational Biology , Protein Interaction Maps , Diffusion , Prospective Studies , Proteins/geneticsABSTRACT
BACKGROUND: Previous studies have indicated the bidirectionality between autoimmune and mental disorders. However, genetic studies underpinning the co-occurrence of the two disorders have been lacking. In this study, we examined the potential genetic contribution to the association between autoimmune and mental disorders and investigated the genetic basis of overall autoimmune disease. METHODS: We used diagnostic information from patients with seven autoimmune diseases and six mental disorders from the Danish population-based case-cohort sample (iPSYCH2012). We explored the epidemiological association using survival analysis and modelled the effect of polygenic risk scores (PRSs) on autoimmune and mental diseases. Genetic factors were investigated using GWAS and imputed HLA alleles in the iPSYCH cohort. RESULTS: Of 64,039 individuals, a total of 43,902 (68.6%) were diagnosed with mental disorders and 1383 (2.2%) with autoimmune diseases. There was a significant comorbidity between the two disease classes (Pâ¯=â¯2.67â¯×â¯10-7, ORâ¯=â¯1.38, 95%CIâ¯=â¯1.22-1.56), with an overall bidirectional association, wherein individuals with autoimmune diseases had an increased risk of subsequent mental disorders (HRâ¯=â¯1.13, 95%CI: 1.07-1.21, Pâ¯=â¯7.95â¯×â¯10-5) and vice versa (HRâ¯=â¯1.27, 95%CIâ¯=â¯1.16-1.39, Pâ¯=â¯8.77â¯×â¯10-15). Adding PRSs to these adjustment models did not have an impact on the associations. PRSs for autoimmune diseases were only slightly associated with increased risk of mental disorders (HRâ¯=â¯1.01, 95%CI: 1.00-1.02, pâ¯=â¯0.038), whereas PRSs for mental disorders were not associated with autoimmune diseases overall. Our GWAS highlighted 12 loci on chromosome 6 (minimum Pâ¯=â¯2.74â¯×â¯10-36, ORâ¯=â¯1.80, 95% CI: 1.64-1.96), which were implicated in gene regulation through bioinformatic functional analyses, thereby identifying new candidate genes for overall autoimmune disease. Moreover, we observed 20 human leukocyte antigen (HLA) alleles strongly associated, either positively or negatively, with overall autoimmune disease, but we did not find significant evidence of their associations with overall mental disorders. A GWAS of a comorbid diagnosis of an autoimmune disease and a mental disorder identified a genome-wide significant locus on chromosome 7 as well (Pâ¯=â¯1.43â¯×â¯10-8, ORâ¯=â¯10.65, 95%CIâ¯=â¯3.21-35.36). CONCLUSIONS: Our findings confirm the overall comorbidity and bidirectionality between autoimmune diseases and mental disorders and identify HLA genes which are significantly associated with overall autoimmune disease. Additionally, we identified several new candidate genes for overall autoimmune disease and ranked them based on their association with the investigated diseases.
Subject(s)
Autoimmune Diseases , Mental Disorders , Psychotic Disorders , Autoimmune Diseases/epidemiology , Autoimmune Diseases/genetics , Comorbidity , Denmark/epidemiology , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Mental Disorders/epidemiology , Mental Disorders/genetics , Polymorphism, Single NucleotideABSTRACT
Changes in the mean and variance of gene expression with age have consequences for healthy aging and disease development. Age-dependent changes in phenotypic variance have been associated with a decline in regulatory functions leading to increase in disease risk. Here, we investigate age-related mean and variance changes in gene expression measured by RNA-seq of fat, skin, whole blood and derived lymphoblastoid cell lines (LCLs) expression from 855 adult female twins. We see evidence of up to 60% of age effects on transcription levels shared across tissues, and 47% of those on splicing. Using gene expression variance and discordance between genetically identical MZ twin pairs, we identify 137 genes with age-related changes in variance and 42 genes with age-related discordance between co-twins; implying the latter are driven by environmental effects. We identify four eQTLs whose effect on expression is age-dependent (FDR 5%). Combined, these results show a complicated mix of environmental and genetically driven changes in expression with age. Using the twin structure in our data, we show that additive genetic effects explain considerably more of the variance in gene expression than aging, but less that other environmental factors, potentially explaining why reliable expression-derived biomarkers for healthy-aging have proved elusive compared with those derived from methylation.
Subject(s)
Gene Expression/genetics , Adult , Aged , Aged, 80 and over , Biomarkers/analysis , Cell Line , Cohort Studies , Exons/genetics , Female , Humans , Middle Aged , RNA Splicing/genetics , Twins, Monozygotic/geneticsABSTRACT
Gastrointestinal infections can be life threatening, but not much is known about the host's genetic contribution to susceptibility to gastrointestinal infections or the latter's association with psychiatric disorders. We utilized iPSYCH, a genotyped population-based sample of individuals born between 1981 and 2005 comprising 65,534 unrelated Danish individuals (45,889 diagnosed with mental disorders and 19,645 controls from a random population sample) in which all individuals were linked utilizing nationwide population-based registers to estimate the genetic contribution to susceptibility to gastrointestinal infections, identify genetic variants associated with gastrointestinal infections, and examine the link between gastrointestinal infections and psychiatric and neurodevelopmental disorders. The SNP heritability of susceptibility to gastrointestinal infections ranged from 3.7% to 6.4% on the liability scale. Significant correlations were found between gastrointestinal infections and the combined group of mental disorders (OR = 2.09; 95% CI: 1.82-2.4, P = 1.87 × 10-25). Correlations with autism spectrum disorder, attention deficit hyperactivity disorder, and depression were also significant. We identified a genome-wide significant locus associated with susceptibility to gastrointestinal infections (OR = 1.13; 95% CI: 1.08-1.18, P = 2.9 × 10-8), where the top SNP was an eQTL for the ABO gene. The risk allele was associated with reduced ABO expression, providing, for the first time, genetic evidence to support previous studies linking the O blood group to gastrointestinal infections. This study also highlights the importance of integrative work in genetics, psychiatry, infection, and epidemiology on the road to translational medicine.
Subject(s)
Gastrointestinal Diseases/epidemiology , Genetic Markers , Genetic Predisposition to Disease , Mental Disorders/physiopathology , Neurodevelopmental Disorders/physiopathology , Case-Control Studies , Cohort Studies , Denmark/epidemiology , Female , Gastrointestinal Diseases/genetics , Gastrointestinal Diseases/microbiology , Genome-Wide Association Study , Genotype , Humans , Incidence , Male , Polymorphism, Single Nucleotide , Quantitative Trait LociABSTRACT
Gene expression is dependent on genetic and environmental factors. In the last decade, a large body of research has significantly improved our understanding of the genetic architecture of gene expression. However, it remains unclear whether genetic effects on gene expression remain stable over time. Here, we show, using longitudinal whole-blood gene expression data from a twin cohort, that the genetic architecture of a subset of genes is unstable over time. In addition, we identified 2213 genes differentially expressed across time points that we linked with aging within and across studies. Interestingly, we discovered that most differentially expressed genes were affected by a subset of 77 putative causal genes. Finally, we observed that putative causal genes and down-regulated genes were affected by a loss of genetic control between time points. Taken together, our data suggest that instability in the genetic architecture of a subset of genes could lead to widespread effects on the transcriptome with an aging signature.
Subject(s)
Aging/genetics , Gene Expression Regulation, Developmental , Transcriptome , Aged , Female , Humans , Middle Aged , Twins, Dizygotic/genetics , Twins, Monozygotic/geneticsABSTRACT
Studies attempting to functionally interpret complex-disease susceptibility loci by GWAS and eQTL integration have predominantly employed microarrays to quantify gene-expression. RNA-Seq has the potential to discover a more comprehensive set of eQTLs and illuminate the underlying molecular consequence. We examine the functional outcome of 39 variants associated with Systemic Lupus Erythematosus (SLE) through the integration of GWAS and eQTL data from the TwinsUK microarray and RNA-Seq cohort in lymphoblastoid cell lines. We use conditional analysis and a Bayesian colocalisation method to provide evidence of a shared causal-variant, then compare the ability of each quantification type to detect disease relevant eQTLs and eGenes. We discovered the greatest frequency of candidate-causal eQTLs using exon-level RNA-Seq, and identified novel SLE susceptibility genes (e.g. NADSYN1 and TCF7) that were concealed using microarrays, including four non-coding RNAs. Many of these eQTLs were found to influence the expression of several genes, supporting the notion that risk haplotypes may harbour multiple functional effects. Novel SLE associated splicing events were identified in the T-reg restricted transcription factor, IKZF2, and other candidate genes (e.g. WDFY4) through asQTL mapping using the Geuvadis cohort. We have significantly increased our understanding of the genetic control of gene-expression in SLE by maximising the leverage of RNA-Seq and performing integrative GWAS-eQTL analysis against gene, exon, and splice-junction quantifications. We conclude that to better understand the true functional consequence of regulatory variants, quantification by RNA-Seq should be performed at the exon-level as a minimum, and run in parallel with gene and splice-junction level quantification.
Subject(s)
Genetic Predisposition to Disease , Lupus Erythematosus, Systemic/genetics , Quantitative Trait Loci/genetics , RNA, Untranslated/genetics , Alternative Splicing/genetics , Carbon-Nitrogen Ligases with Glutamine as Amide-N-Donor/biosynthesis , Carbon-Nitrogen Ligases with Glutamine as Amide-N-Donor/genetics , Chromosome Mapping , Female , Gene Expression Regulation , Genome-Wide Association Study , Haplotypes , Humans , Lupus Erythematosus, Systemic/pathology , Male , Polymorphism, Single Nucleotide , T Cell Transcription Factor 1/biosynthesis , T Cell Transcription Factor 1/geneticsABSTRACT
Obesity is a global epidemic that is causally associated with a range of diseases, including type 2 diabetes and cardiovascular disease, at the population-level. However, there is marked heterogeneity in obesity-related outcomes among individuals. This might reflect genotype-dependent responses to adiposity. Given that adiposity, measured by BMI, is associated with widespread changes in gene expression and regulatory variants mediate the majority of known complex trait loci, we sought to identify gene-by-BMI (G × BMI) interactions on the regulation of gene expression in a multi-tissue RNA-sequencing (RNA-seq) dataset from the TwinsUK cohort (n = 856). At a false discovery rate of 5%, we identified 16 cis G × BMI interactions (top cis interaction: CHURC1, rs7143432, p = 2.0 × 10(-12)) and one variant regulating 53 genes in trans (top trans interaction: ZNF423, rs3851570, p = 8.2 × 10(-13)), all in adipose tissue. The interactions were adipose-specific and enriched for variants overlapping adipocyte enhancers, and regulated genes were enriched for metabolic and inflammatory processes. We replicated a subset of the interactions in an independent adipose RNA-seq dataset (deCODE genetics, n = 754). We also confirmed the interactions with an alternate measure of obesity, dual-energy X-ray absorptiometry (DXA)-derived visceral-fat-volume measurements, in a subset of TwinsUK individuals (n = 682). The identified G × BMI regulatory effects demonstrate the dynamic nature of gene regulation and reveal a functional mechanism underlying the heterogeneous response to obesity. Additionally, we have provided a web browser allowing interactive exploration of the dataset, including of association between expression, BMI, and G × BMI regulatory effects in four tissues.
Subject(s)
Adiposity/genetics , Transcriptome/genetics , Absorptiometry, Photon , Body Mass Index , Cohort Studies , DNA-Binding Proteins/genetics , Datasets as Topic , Female , Humans , Intra-Abdominal Fat/anatomy & histology , Intra-Abdominal Fat/metabolism , Male , Middle Aged , Obesity/genetics , Obesity/metabolism , Organ Specificity , Polymorphism, Single Nucleotide/genetics , Proteins , Sequence Analysis, RNA , Twins/genetics , United KingdomABSTRACT
Summary: Label propagation and diffusion over biological networks are a common mathematical formalism in computational biology for giving context to molecular entities and prioritizing novel candidates in the area of study. There are several choices in conceiving the diffusion process-involving the graph kernel, the score definitions and the presence of a posterior statistical normalization-which have an impact on the results. This manuscript describes diffuStats, an R package that provides a collection of graph kernels and diffusion scores, as well as a parallel permutation analysis for the normalized scores, that eases the computation of the scores and their benchmarking for an optimal choice. Availability and implementation: The R package diffuStats is publicly available in Bioconductor, https://bioconductor.org, under the GPL-3 license. Contact: sergi.picart@upc.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Computational Biology/methods , Software , Metabolic Networks and Pathways , Protein Interaction Maps , Yeasts/metabolismABSTRACT
Thrombotic diseases are among the leading causes of morbidity and mortality in the world. To add insights into the genetic regulation of thrombotic disease, we conducted a genome-wide association study (GWAS) of 6135 self-reported blood clots events and 252 827 controls of European ancestry belonging to the 23andMe cohort of research participants. Eight loci exceeded genome-wide significance. Among the genome-wide significant results, our study replicated previously known venous thromboembolism (VTE) loci near the F5, FGA-FGG, F11, F2, PROCR and ABO genes, and the more recently discovered locus near SLC44A2 In addition, our study reports for the first time a genome-wide significant association between rs114209171, located upstream of the F8 structural gene, and thrombosis risk. Analyses of expression profiles and expression quantitative trait loci across different tissues suggested SLC44A2, ILF3 and AP1M2 as the three most plausible candidate genes for the chromosome 19 locus, our only genome-wide significant thrombosis-related locus that does not harbor likely coagulation-related genes. In addition, we present data showing that this locus also acts as a novel risk factor for stroke and coronary artery disease (CAD). In conclusion, our study reveals novel common genetic risk factors for VTE, stroke and CAD and provides evidence that self-reported data on blood clots used in a GWAS yield results that are comparable with those obtained using clinically diagnosed VTE. This observation opens up the potential for larger meta-analyses, which will enable elucidation of the genetics of thrombotic diseases, and serves as an example for the genetic study of other diseases.
Subject(s)
Genetic Loci/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics , Thrombosis/genetics , Adaptor Protein Complex 1/genetics , Adaptor Protein Complex mu Subunits/genetics , Adolescent , Adult , Biomarkers/metabolism , Case-Control Studies , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Membrane Glycoproteins/genetics , Membrane Transport Proteins/genetics , Middle Aged , Nuclear Factor 90 Proteins/genetics , Risk Factors , Self Report , Young AdultABSTRACT
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.
Subject(s)
Alternative Splicing/genetics , DNA Methylation/genetics , Epigenesis, Genetic , Gene Expression Regulation/genetics , Genetic Variation , Alleles , CpG Islands , Humans , Infant, Newborn , Organ Specificity , Polymorphism, Single Nucleotide/genetics , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid/geneticsABSTRACT
Gene expression levels can be subject to selection. We hypothesized that the age of gene origin is associated with expression constraints, given that it affects the level of gene integration into the functional cellular environment. By studying the genetic variation affecting gene expression levels (cis expression quantitative trait loci [cis-eQTLs]) and protein levels (cis protein QTLs [cis-pQTLs]), we determined that young, primate-specific genes are enriched in cis-eQTLs and cis-pQTLs. Compared to cis-eQTLs of old genes originating before the zebrafish divergence, cis-eQTLs of young genes have a higher effect size, are located closer to the transcription start site, are more significant, and tend to influence genes in multiple tissues and populations. These results suggest that the expression constraint of each gene increases throughout its lifespan. We also detected a positive correlation between expression constraints (approximated by cis-eQTL properties) and coding constraints (approximated by Ka/Ks) and observed that this correlation might be driven by gene age. To uncover factors associated with the increase in gene-age-related expression constraints, we demonstrated that gene connectivity, gene involvement in complex regulatory networks, gene haploinsufficiency, and the strength of posttranscriptional regulation increase with gene age. We also observed an increase in heritability of gene expression levels with age, implying a reduction of the environmental component. In summary, we show that gene age shapes key gene properties during evolution and is therefore an important component of genome function.
Subject(s)
Gene Expression Regulation , Genetic Variation , Genome/genetics , Proteins/genetics , Quantitative Trait Loci/genetics , Age Factors , Cell Line , Female , Fetal Blood , Fibroblasts , Gene Expression Profiling , Humans , Infant, Newborn , Logistic Models , Male , Organ Specificity , Polymorphism, Single Nucleotide , Proteins/metabolism , Transcription Initiation Site , Umbilical CordABSTRACT
MOTIVATION: In order to discover quantitative trait loci, multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. RESULTS: We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted P-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot datasets can be now easily analyzed an order of magnitude faster than previous approaches. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/ CONTACT: emmanouil.dermitzakis@unige.ch or olivier.delaneau@unige.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Quantitative Trait Loci , Genomics , Phenotype , Software , Statistical DistributionsABSTRACT
UNLABELLED: : The open source environment R is one of the most widely used software for statistical computing. It provides a variety of applications including statistical genetics. Most of the powerful tools for quantitative genetic analyses are stand-alone free programs developed by researchers in academia. SOLAR is one of the standard software programs to perform linkage and association mappings of the quantitative trait loci (QTLs) in pedigrees of arbitrary size and complexity. solarius allows the user to exploit the variance component methods implemented in SOLAR. It automates such routine operations as formatting pedigree and phenotype data. It parses also the model output and contains summary and plotting functions for exploration of the results. In addition, solarius enables parallel computing of the linkage and association analyses that makes the calculation of genome-wide scans more efficient. AVAILABILITY AND IMPLEMENTATION: solarius is available on CRAN and on GitHub https://github.com/ugcd/solarius CONTACT: : aziyatdinov@santpau.cat.
Subject(s)
Genetic Linkage , Quantitative Trait Loci , Software , Analysis of Variance , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Models, Statistical , Multivariate Analysis , PedigreeABSTRACT
Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR)â=â5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.
Subject(s)
Gene Expression Regulation/genetics , Gene Regulatory Networks , Genetic Association Studies , Quantitative Trait Loci/genetics , Cell Line, Tumor , DNA Copy Number Variations/genetics , Genome, Human , Genomics , Humans , Phenotype , Polymorphism, Single Nucleotide/geneticsABSTRACT
Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of methylation variation and how it is regulated by genetic and environmental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h(2)median = 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index [BMI]). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.