RESUMEN
By analyzing 15,000 samples from 348 mammalian species, we derive DNA methylation (DNAm) predictors of maximum life span (R = 0.89), gestation time (R = 0.96), and age at sexual maturity (R = 0.85). Our maximum life-span predictor indicates a potential innate longevity advantage for females over males in 17 mammalian species including humans. The DNAm maximum life-span predictions are not affected by caloric restriction or partial reprogramming. Genetic disruptions in the somatotropic axis such as growth hormone receptors have an impact on DNAm maximum life span only in select tissues. Cancer mortality rates show no correlation with our epigenetic estimates of life-history traits. The DNAm maximum life-span predictor does not detect variation in life span between individuals of the same species, such as between the breeds of dogs. Maximum life span is determined in part by an epigenetic signature that is an intrinsic species property and is distinct from the signatures that relate to individual mortality risk.
Asunto(s)
Metilación de ADN , Epigénesis Genética , Longevidad , Mamíferos , Animales , Longevidad/genética , Mamíferos/genética , Femenino , Humanos , Masculino , Rasgos de la Historia de Vida , Especificidad de la EspecieRESUMEN
Development of embryonic stem cells (ESCs) into neurons requires intricate regulation of transcription, splicing, and translation, but how these processes interconnect is not understood. We found that polypyrimidine tract binding protein 1 (PTBP1) controls splicing of DPF2, a subunit of BRG1/BRM-associated factor (BAF) chromatin remodeling complexes. Dpf2 exon 7 splicing is inhibited by PTBP1 to produce the DPF2-S isoform early in development. During neuronal differentiation, loss of PTBP1 allows exon 7 inclusion and DPF2-L expression. Different cellular phenotypes and gene expression programs were induced by these alternative DPF2 isoforms. We identified chromatin binding sites enriched for each DPF2 isoform, as well as sites bound by both. In ESC, DPF2-S preferential sites were bound by pluripotency factors. In neuronal progenitors, DPF2-S sites were bound by nuclear factor I (NFI), while DPF2-L sites were bound by CCCTC-binding factor (CTCF). DPF2-S sites exhibited enhancer modifications, while DPF2-L sites showed promoter modifications. Thus, alternative splicing redirects BAF complex targeting to impact chromatin organization during neuronal development.
Asunto(s)
Empalme Alternativo , Diferenciación Celular , Cromatina , Ribonucleoproteínas Nucleares Heterogéneas , Neuronas , Proteína de Unión al Tracto de Polipirimidina , Factores de Transcripción , Empalme Alternativo/genética , Proteína de Unión al Tracto de Polipirimidina/metabolismo , Proteína de Unión al Tracto de Polipirimidina/genética , Animales , Diferenciación Celular/genética , Cromatina/metabolismo , Ratones , Neuronas/metabolismo , Neuronas/citología , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Ribonucleoproteínas Nucleares Heterogéneas/metabolismo , Ribonucleoproteínas Nucleares Heterogéneas/genética , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/genética , Transcripción Genética , Células Madre Embrionarias/metabolismo , Células Madre Embrionarias/citología , Exones/genética , Humanos , Autorrenovación de las Células/genéticaRESUMEN
Knowing the genes involved in quantitative traits provides an entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six quantitative trait loci (QTLs) by quantitative complementation, and identified six genes. Four genes, Lamp, Ptprd, Nptx2, and Sh3gl, have known roles in synapse function; the fifth, Psip1, was not previously implicated in behavior; and the sixth is a long non-coding RNA, 4933413L06Rik, of unknown function. Variation in transcriptome and epigenetic modalities occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results relieve a bottleneck in using genetic mapping of QTLs to uncover biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
Asunto(s)
Miedo , Sitios de Carácter Cuantitativo , Animales , Femenino , Masculino , Ratones , Conducta Animal/fisiología , Mapeo Cromosómico , Miedo/fisiología , Ratones Endogámicos C57BL , Prueba de Complementación GenéticaRESUMEN
Whole-genome sequencing (WGS) data is facilitating genome-wide identification of rare noncoding variants, while elucidating their roles in disease remains challenging. Towards this end, we first revisit a reported significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin states annotations of variants that are predictive of the proband-sibling local GC content differences. Our work provides new insights into associations of non-coding de novo mutations in ASD and presents an analytical framework applicable to other phenotypes.
RESUMEN
Knowing the genes involved in quantitative traits provides a critical entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. Here we address a key step towards that goal by deploying a test that directly queries whether a gene mediates the effect of a quantitative trait locus (QTL). To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six QTLs, and identified six genes. Four genes, Lsamp, Ptprd, Nptx2 and Sh3gl, have known roles in synapse function; the fifth gene, Psip1, is a transcriptional co-activator not previously implicated in behavior; the sixth is a long non-coding RNA 4933413L06Rik with no known function. Single nucleus transcriptomic and epigenetic analyses implicated excitatory neurons as likely mediating the genetic effects. Surprisingly, variation in transcriptome and epigenetic modalities between inbred strains occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results open a bottleneck in using genetic mapping of QTLs to find novel biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
RESUMEN
DNA methylation data offers valuable insights into various aspects of mammalian biology. The recent introduction and large-scale application of the mammalian methylation array has significantly expanded the availability of such data across conserved sites in many mammalian species. In our study, we consider 13,245 samples profiled on this array encompassing 348 species and 59 tissues from 746 species-tissue combinations. While having some coverage of many different species and tissue types, this data captures only 3.6% of potential species-tissue combinations. To address this gap, we developed CMImpute (Cross-species Methylation Imputation), a method based on a Conditional Variational Autoencoder, to impute DNA methylation for non-profiled species-tissue combinations. In cross-validation, we demonstrate that CMImpute achieves a strong correlation with actual observed values, surpassing several baseline methods. Using CMImpute we imputed methylation data for 19,786 new species-tissue combinations. We believe that both CMImpute and our imputed data resource will be useful for DNA methylation analyses across a wide range of mammalian species.
RESUMEN
Relating genetic variants to behavior remains a fundamental challenge. To assess the utility of DNA methylation marks in discovering causative variants, we examined their relationship to genetic variation by generating single-nucleus methylomes from the hippocampus of eight inbred mouse strains. At CpG sequence densities under 40 CpG/Kb, cells compensate for loss of methylated sites by methylating additional sites to maintain methylation levels. At higher CpG sequence densities, the exact location of a methylated site becomes more important, suggesting that variants affecting methylation will have a greater effect when occurring in higher CpG densities than in lower. We found this to be true for a variant's effect on transcript abundance, indicating that candidate variants can be prioritized based on CpG sequence density. Our findings imply that DNA methylation influences the likelihood that mutations occur at specific sites in the genome, supporting the view that the distribution of mutations is not random.
RESUMEN
Various computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses. We present ChromGene, a method based on a mixture of learned hidden Markov models, to annotate genes based on multiple epigenomic maps across the gene body and flanks. We provide ChromGene assignments for over 100 cell and tissue types. We characterize the mixture components in terms of gene expression, constraint, and other gene annotations. The ChromGene method and annotations will provide a useful resource for gene-based epigenomic analyses.
Asunto(s)
Epigenoma , Epigenómica , Prueba de Histocompatibilidad , Aprendizaje , Anotación de Secuencia MolecularRESUMEN
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
RESUMEN
Age and sex have a profound effect on cytosine methylation levels in humans and many other species. Here we analyzed DNA methylation profiles of 2400 tissues derived from 37 primate species including 11 haplorhine species (baboons, marmosets, vervets, rhesus macaque, chimpanzees, gorillas, orangutan, humans) and 26 strepsirrhine species (suborders Lemuriformes and Lorisiformes). From these we present here, pan-primate epigenetic clocks which are highly accurate for all primates including humans (age correlation R = 0.98). We also carried out in-depth analysis of baboon DNA methylation profiles and generated five epigenetic clocks for baboons (Olive-yellow baboon hybrid), one of which, the pan-tissue epigenetic clock, was trained on seven tissue types (fetal cerebral cortex, adult cerebral cortex, cerebellum, adipose, heart, liver, and skeletal muscle) with ages ranging from late fetal life to 22.8 years of age. Using the primate data, we characterize the effect of age and sex on individual cytosines in highly conserved regions. We identify 11 sex-related CpGs on autosomes near genes (POU3F2, CDYL, MYCL, FBXL4, ZC3H10, ZXDC, RRAS, FAM217A, RBM39, GRIA2, UHRF2). Low overlap can be observed between age- and sex-related CpGs. Overall, this study advances our understanding of conserved age- and sex-related epigenetic changes in primates, and provides biomarkers of aging for all primates.
Asunto(s)
Metilación de ADN , Epigénesis Genética , Humanos , Animales , Macaca mulatta/genética , Envejecimiento/genética , Papio , Ubiquitina-Proteína Ligasas , Proteínas PortadorasRESUMEN
A large-scale application of the "stacked modeling" approach for chromatin state discovery previously provides a single "universal" chromatin state annotation of the human genome based jointly on data from many cell and tissue types. Here, we produce an analogous chromatin state annotation for mouse based on 901 datasets assaying 14 chromatin marks in 26 cell or tissue types. To characterize each chromatin state, we relate the states to external annotations and compare them to analogously defined human states. We expect the universal chromatin state annotation for mouse to be a useful resource for studying this key model organism's genome.
Asunto(s)
Cromatina , Genoma Humano , Animales , Ratones , Cromatina/genética , Anotación de Secuencia MolecularRESUMEN
Epigenetic mechanisms guiding articular cartilage regeneration and age-related disease such as osteoarthritis (OA) are poorly understood. STAT3 is a critical age-patterned transcription factor highly active in fetal and OA chondrocytes, but the context-specific role of STAT3 in regulating the epigenome of cartilage cells remain elusive. In this study, DNA methylation profiling was performed across human chondrocyte ontogeny to build an epigenetic clock and establish an association between CpG methylation and human chondrocyte age. Exposure of adult chondrocytes to a small molecule STAT3 agonist decreased DNA methylation, while genetic ablation of STAT3 in fetal chondrocytes induced global hypermethylation. CUT&RUN assay and subsequent transcriptional validation revealed DNA methyltransferase 3 beta (DNMT3B) as one of the putative STAT3 targets in chondrocyte development and OA. Functional assessment of human OA chondrocytes showed the acquisition of progenitor-like immature phenotype by a significant subset of cells. Finally, conditional deletion of Stat3 in cartilage cells increased DNMT3B expression in articular chondrocytes in the knee joint in vivo and resulted in a more prominent OA progression in a post-traumatic OA (PTOA) mouse model induced by destabilization of the medial meniscus (DMM). Taken together these data reveal a novel role for STAT3 in regulating DNA methylation in cartilage development and disease. Our findings also suggest that elevated levels of active STAT3 in OA chondrocytes may indicate an intrinsic attempt of the tissue to regenerate by promoting a progenitor-like phenotype. However, it is likely that chronic activation of this pathway, induced by IL-6 cytokines, is detrimental and leads to tissue degeneration.
Asunto(s)
Cartílago Articular , Osteoartritis , Ratones , Animales , Humanos , Condrocitos/metabolismo , Células Cultivadas , Osteoartritis/genética , Osteoartritis/metabolismo , Cartílago Articular/metabolismo , Epigénesis Genética , Metilación de ADN/genética , Factor de Transcripción STAT3/genética , Factor de Transcripción STAT3/metabolismoRESUMEN
MOTIVATION: Genome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution. RESULTS: We developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP's probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution. AVAILABILITY AND IMPLEMENTATION: The CSREP source code and generated data are available at http://github.com/ernstlab/csrep. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Cromatina , Genómica , Cromatina/genética , Genómica/métodos , Genoma , Programas Informáticos , Mapeo CromosómicoRESUMEN
Infinium methylation arrays are not available for the vast majority of non-human mammals. Moreover, even if species-specific arrays were available, probe differences between them would confound cross-species comparisons. To address these challenges, we developed the mammalian methylation array, a single custom array that measures up to 36k CpGs per species that are well conserved across many mammalian species. We designed a set of probes that can tolerate specific cross-species mutations. We annotate the array in over 200 species and report CpG island status and chromatin states in select species. Calibration experiments demonstrate the high fidelity in humans, rats, and mice. The mammalian methylation array has several strengths: it applies to all mammalian species even those that have not yet been sequenced, it provides deep coverage of conserved cytosines facilitating the development of epigenetic biomarkers, and it increases the probability that biological insights gained in one species will translate to others.
Asunto(s)
Secuencia Conservada , Metilación de ADN , Mamíferos/genética , Mamíferos/metabolismo , Procesamiento Proteico-Postraduccional/genética , Procesamiento Proteico-Postraduccional/fisiología , Animales , Biomarcadores , Islas de CpG , Epigénesis Genética , Humanos , Ratones , Mutación , Ratas , TranscriptomaRESUMEN
BACKGROUND: Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative "stacked modeling" approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. RESULTS: Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. CONCLUSIONS: The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.
Asunto(s)
Epigenómica , Genoma Humano , Cromatina/genética , Genómica , Humanos , Anotación de Secuencia MolecularRESUMEN
DNA methylation-based biomarkers of aging have been developed for many mammals but not yet for the vervet monkey (Chlorocebus sabaeus), which is a valuable non-human primate model for biomedical studies. We generated novel DNA methylation data from vervet cerebral cortex, blood, and liver using highly conserved mammalian CpGs represented on a custom array (HorvathMammalMethylChip40). We present six DNA methylation-based estimators of age: vervet multi-tissue epigenetic clock and tissue-specific clocks for brain cortex, blood, and liver. In addition, we developed two dual species clocks (human-vervet clocks) for measuring chronological age and relative age, respectively. Relative age was defined as ratio of chronological age to maximum lifespan to address the species differences in maximum lifespan. The high accuracy of the human-vervet clocks demonstrates that epigenetic aging processes are evolutionary conserved in primates. When applying these vervet clocks to tissue samples from another primate species, rhesus macaque, we observed high age correlations but strong offsets. We characterized CpGs that correlate significantly with age in the vervet. CpG probes that gain methylation with age across tissues were located near the targets of Polycomb proteins SUZ12 and EED and genes possessing the trimethylated H3K27 mark in their promoters. The epigenetic clocks are expected to be useful for anti-aging studies in vervets.
Asunto(s)
Epigénesis Genética , Epigenómica , Animales , Chlorocebus aethiops , Metilación de ADN , Longevidad , Macaca mulatta/genética , MamíferosRESUMEN
[This corrects the article DOI: 10.1093/nargab/lqaa104.].
RESUMEN
In recent years, methods were proposed for assigning feature importance scores to measure the contribution of individual features. While in some cases the goal is to understand a specific model, in many cases the goal is to understand the contribution of certain properties (features) to a real-world phenomenon. Thus, a distinction has been made between feature importance scores that explain a model and scores that explain the data. When explaining the data, machine learning models are used as proxies in settings where conducting many real-world experiments is expensive or prohibited. While existing feature importance scores show great success in explaining models, we demonstrate their limitations when explaining the data, especially in the presence of correlations between features. Therefore, we develop a set of axioms to capture properties expected from a feature importance score when explaining data and prove that there exists only one score that satisfies all of them, the Marginal Contribution Feature Importance (MCI). We analyze the theoretical properties of this score function and demonstrate its merits empirically.
RESUMEN
Methylation levels at specific CpG positions in the genome have been used to develop accurate estimators of chronological age in humans, mice, and other species. Although epigenetic clocks are generally species-specific, the principles underpinning them appear to be conserved at least across the mammalian class. This is exemplified by the successful development of epigenetic clocks for mice and several other mammalian species. Here, we describe epigenetic clocks for the rhesus macaque (Macaca mulatta), the most widely used nonhuman primate in biological research. Using a custom methylation array (HorvathMammalMethylChip40), we profiled n = 281 tissue samples (blood, skin, adipose, kidney, liver, lung, muscle, and cerebral cortex). From these data, we generated five epigenetic clocks for macaques. These clocks differ with regard to applicability to different tissue types (pan-tissue, blood, skin), species (macaque only or both humans and macaques), and measure of age (chronological age versus relative age). Additionally, the age-based human-macaque clock exhibits a high age correlation (R = 0.89) with the vervet monkey (Chlorocebus sabaeus), another Old World species. Four CpGs within the KLF14 promoter were consistently altered with age in four tissues (adipose, blood, cerebral cortex, skin). Future studies will be needed to evaluate whether these epigenetic clocks predict age-related conditions in the rhesus macaque.
Asunto(s)
Metilación de ADN , Epigénesis Genética , Macaca mulatta , Envejecimiento , Animales , Chlorocebus aethiops , Epigenómica , Macaca mulatta/genética , Regiones Promotoras GenéticasRESUMEN
Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.