RESUMO
Innate lymphocytes encompass a diverse array of phenotypic identities with specialized functions. DNA methylation and hydroxymethylation are essential for epigenetic fidelity and fate commitment. The landscapes of these modifications are unknown in innate lymphocytes. Here, we characterized the whole-genome distribution of methyl-CpG and 5-hydroxymethylcytosine (5hmC) in mouse innate lymphoid cell 3 (ILC3), ILC2 and natural killer (NK) cells. We identified differentially methylated regions (DMRs) and differentially hydroxymethylated regions (DHMRs) between ILC and NK cell subsets and correlated them with transcriptional signatures. We associated lineage-determining transcription factors (LDTFs) with demethylation and demonstrated unique patterns of DNA methylation/hydroxymethylation in relationship to open chromatin regions (OCRs), histone modifications and TF-binding sites. We further identified an association between hydroxymethylation and NK cell superenhancers (SEs). Using mice lacking the DNA hydroxymethylase TET2, we showed the requirement for TET2 in optimal production of hallmark cytokines by ILC3s and interleukin-17A (IL-17A) by inflammatory ILC2s. These findings provide a powerful resource for studying innate lymphocyte epigenetic regulation and decode the regulatory logic governing their identity.
Assuntos
Metilação de DNA , Imunidade Inata , Animais , Cromatina/genética , Epigênese Genética , Imunidade Inata/genética , Células Matadoras Naturais , Linfócitos , CamundongosRESUMO
Divergence of cis-regulatory elements drives species-specific traits1, but how this manifests in the evolution of the neocortex at the molecular and cellular level remains unclear. Here we investigated the gene regulatory programs in the primary motor cortex of human, macaque, marmoset and mouse using single-cell multiomics assays, generating gene expression, chromatin accessibility, DNA methylome and chromosomal conformation profiles from a total of over 200,000 cells. From these data, we show evidence that divergence of transcription factor expression corresponds to species-specific epigenome landscapes. We find that conserved and divergent gene regulatory features are reflected in the evolution of the three-dimensional genome. Transposable elements contribute to nearly 80% of the human-specific candidate cis-regulatory elements in cortical cells. Through machine learning, we develop sequence-based predictors of candidate cis-regulatory elements in different species and demonstrate that the genomic regulatory syntax is highly preserved from rodents to primates. Finally, we show that epigenetic conservation combined with sequence similarity helps to uncover functional cis-regulatory elements and enhances our ability to interpret genetic variants contributing to neurological disease and traits.
Assuntos
Sequência Conservada , Evolução Molecular , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Mamíferos , Neocórtex , Animais , Humanos , Camundongos , Callithrix/genética , Cromatina/genética , Cromatina/metabolismo , Sequência Conservada/genética , Metilação de DNA , Elementos de DNA Transponíveis/genética , Epigenoma , Regulação da Expressão Gênica/genética , Macaca/genética , Mamíferos/genética , Córtex Motor/citologia , Córtex Motor/metabolismo , Multiômica , Neocórtex/citologia , Neocórtex/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Célula Única , Fatores de Transcrição/metabolismo , Variação Genética/genéticaRESUMO
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
Assuntos
Genoma Humano , Genômica , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNARESUMO
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genéticaRESUMO
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic data sets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser. It allows users to load functional genomic data sets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single-nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all data sets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies.
Assuntos
Epigenoma , Epigenômica , Humanos , Software , Genômica , Genoma Humano , Bases de Dados Genéticas , InternetRESUMO
The zebrafish (Danio rerio) has been widely used in the study of human disease and development, and about 70% of the protein-coding genes are conserved between the two species1. However, studies in zebrafish remain constrained by the sparse annotation of functional control elements in the zebrafish genome. Here we performed RNA sequencing, assay for transposase-accessible chromatin using sequencing (ATAC-seq), chromatin immunoprecipitation with sequencing, whole-genome bisulfite sequencing, and chromosome conformation capture (Hi-C) experiments in up to eleven adult and two embryonic tissues to generate a comprehensive map of transcriptomes, cis-regulatory elements, heterochromatin, methylomes and 3D genome organization in the zebrafish Tübingen reference strain. A comparison of zebrafish, human and mouse regulatory elements enabled the identification of both evolutionarily conserved and species-specific regulatory sequences and networks. We observed enrichment of evolutionary breakpoints at topologically associating domain boundaries, which were correlated with strong histone H3 lysine 4 trimethylation (H3K4me3) and CCCTC-binding factor (CTCF) signals. We performed single-cell ATAC-seq in zebrafish brain, which delineated 25 different clusters of cell types. By combining long-read DNA sequencing and Hi-C, we assembled the sex-determining chromosome 4 de novo. Overall, our work provides an additional epigenomic anchor for the functional annotation of vertebrate genomes and the study of evolutionarily conserved elements of 3D genome organization.
Assuntos
Genoma/genética , Imageamento Tridimensional , Imagem Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Peixe-Zebra/genética , Animais , Encéfalo/metabolismo , Sequência Conservada/genética , Metilação de DNA , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Evolução Molecular , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Heterocromatina/química , Heterocromatina/genética , Heterocromatina/metabolismo , Humanos , Masculino , Camundongos , Especificidade de Órgãos , Regiões Promotoras Genéticas/genética , Análise de Célula Única , Especificidade da EspécieRESUMO
MOTIVATION: With single-cell DNA methylation studies yielding vast datasets, existing data formats struggle with the unique challenges of storage and efficient operations, highlighting a need for improved solutions. RESULTS: BAllC (Binary All Cytosines) emerges as a tailored format for methylation data, addressing these challenges. BAllCools, its complementary software toolkit, enhances parsing, indexing, and querying capabilities, promising superior operational speeds and reduced storage needs. AVAILABILITY AND IMPLEMENTATION: https://github.com/jksr/ballcools.
Assuntos
Metilação de DNA , Análise de Célula Única , Software , Análise de Célula Única/métodos , Humanos , Biologia Computacional/métodosRESUMO
MOTIVATION: Unraveling the transcriptional programs that control how cells divide, differentiate, and respond to their environments requires a precise understanding of transcription factors' (TFs) DNA-binding activities. Calling cards (CC) technology uses transposons to capture transient TF binding events at one instant in time and then read them out at a later time. This methodology can also be used to simultaneously measure TF binding and mRNA expression from single-cell CC and to record and integrate TF binding events across time in any cell type of interest without the need for purification. Despite these advantages, there has been a lack of dedicated bioinformatics tools for the detailed analysis of CC data. RESULTS: We introduce Pycallingcards, a comprehensive Python module specifically designed for the analysis of single-cell and bulk CC data across multiple species. Pycallingcards introduces two innovative peak callers, CCcaller and MACCs, enhancing the accuracy and speed of pinpointing TF binding sites from CC data. Pycallingcards offers a fully integrated environment for data visualization, motif finding, and comparative analysis with RNA-seq and ChIP-seq datasets. To illustrate its practical application, we have reanalyzed previously published mouse cortex and glioblastoma datasets. This analysis revealed novel cell-type-specific binding sites and potential sex-linked TF regulators, furthering our understanding of TF binding and gene expression relationships. Thus, Pycallingcards, with its user-friendly design and seamless interface with the Python data science ecosystem, stands as a critical tool for advancing the analysis of TF functions via CC data. AVAILABILITY AND IMPLEMENTATION: Pycallingcards can be accessed on the GitHub repository: https://github.com/The-Mitra-Lab/pycallingcards.
Assuntos
Ecossistema , Fatores de Transcrição , Animais , Camundongos , Imunoprecipitação da Cromatina , Fatores de Transcrição/metabolismo , Sítios de Ligação , Ligação Proteica , Análise de Sequência de DNARESUMO
Structural variation (SV), including insertions and deletions (indels), is a primary mechanism of genome evolution. However, the mechanism by which SV contributes to epigenome evolution is poorly understood. In this study, we characterized the association between lineage-specific indels and epigenome differences between human and chimpanzee to investigate how SVs might have shaped the epigenetic landscape. By intersecting medium-to-large human-chimpanzee indels (20 bp-50 kb) with putative promoters and enhancers in cranial neural crest cells (CNCCs) and repressed regions in induced pluripotent cells (iPSCs), we found that 12% of indels overlap putative regulatory and repressed regions (RRRs), and 15% of these indels are associated with lineage-biased RRRs. Indel-associated putative enhancer and repressive regions are approximately 1.3 times and approximately three times as likely to be lineage-biased, respectively, as those not associated with indels. We found a twofold enrichment of medium-sized indels (20-50 bp) in CpG island (CGI)-containing promoters than expected by chance. Lastly, from human-specific transposable element insertions, we identified putative regulatory elements, including NR2F1-bound putative CNCC enhancers derived from SVAs and putative iPSC promoters derived from LTR5s. Our results show that different types of indels are associated with specific epigenomic diversity between human and chimpanzee.
RESUMO
WashU Epigenome Browser (https://epigenomegateway.wustl.edu/browser/) is a web-based genomic data exploration tool that provides visualization, integration, and analysis of epigenomic datasets. The newly renovated user interface and functions have enabled researchers to engage with the browser and genomic data more efficiently and effectively since 2018. Here, we introduce a new integrated panel design in the browser that allows users to interact with 1D (genomic features), 2D (such as Hi-C), 3D (genome structure), and 4D (time series) data in a single web page. The browser can display three-dimensional chromatin structures with the 3D viewer module. The 4D tracks, called 'Dynamic' tracks, animatedly display time-series data, allowing for a more striking visual impact to identify the gene or genomic region candidates as a function of time. Genomic data, such as annotation features, numerical values, and chromatin interaction data can all be viewed in the dynamic track mode. Imaging data from microscopy experiments can also be displayed in the browser. In addition to software development, we continue to service and expand the data hubs we host for large consortia including 4DN, Roadmap Epigenomics, TaRGET and ENCODE, among others. Our growing user/developer community developed additional track types as plugins, such as qBed and dynseq tracks, which extend the utility of the browser. The browser serves as a foundation for additional genomics platforms including the WashU Virus Genome Browser (for COVID-19 research) and the Comparative Genome Browser. The WashU Epigenome Browser can also be accessed freely through Amazon Web Services at https://epigenomegateway.org/.
Assuntos
Bases de Dados Genéticas , Epigenoma , Navegador , Humanos , COVID-19/genética , Genoma Humano , Internet , SoftwareRESUMO
SUMMARY: Transposon calling cards is a genomic assay for identifying transcription factor binding sites in both bulk and single cell experiments. Here, we describe the qBED format, an open, text-based standard for encoding and analyzing calling card data. In parallel, we introduce the qBED track on the WashU Epigenome Browser, a novel visualization that enables researchers to inspect calling card data in their genomic context. Finally, through examples, we demonstrate that qBED files can be used to visualize non-calling card datasets, such as Combined Annotation-Dependent Depletion scores and GWAS/eQTL hits, and thus may have broad utility to the genomics community. AVAILABILITY AND IMPLEMENTATION: The qBED track is available on the WashU Epigenome Browser (http://epigenomegateway.wustl.edu/browser), beginning with version 46. Source code for the WashU Epigenome Browser with qBED support is available on GitHub (http://github.com/arnavm/eg-react and http://github.com/lidaof/eg-react). A complete definition of the qBED format is available as part of the WashU Epigenome Browser documentation (https://eg.readthedocs.io/en/latest/tracks.html#qbed-track). We have also released a tutorial on how to upload qBED data to the browser (http://dx.doi.org/10.17504/protocols.io.bca8ishw).
Assuntos
Genoma , Software , Epigenoma , Genômica , Ligação ProteicaRESUMO
Federated Learning (FL) is a privacy-preserving way to utilize the sensitive data generated by smart sensors of user devices, where a central parameter server (PS) coordinates multiple user devices to train a global model. However, relying on centralized topology poses challenges when applying FL in a sensors network, including imbalanced communication congestion and possible single point of failure, especially on the PS. To alleviate these problems, we devise a Dynamic Average Consensus-based Federated Learning (DACFL) for implementing FL in a decentralized sensors network. Different from existing studies that replace the model aggregation roughly with neighbors' average, we first transform the FL model aggregation, which is the most intractable in a decentralized topology, into the dynamic average consensus problem by treating a local training procedure as a discrete-time series.We then employ the first-order dynamic average consensus (FODAC) to estimate the average model, which not only solves the model aggregation for DACFL but also ensures model consistency as much as possible. To improve the performance with non-i.i.d data, each user also takes the neighbors' average model as its next-round initialization, which prevents the possible local over-fitting. Besides, we also provide a basic theoretical analysis of DACFL on the premise of i.i.d data. The result validates the feasibility of DACFL in both time-invariant and time-varying topologies and declares that DACFL outperforms existing studies, including CDSGD and D-PSGD, in most cases. Take the result on Fashion-MNIST as a numerical example, with i.i.d data, our DACFL achieves 19â¼34% and 3â¼10% increases in average accuracy; with non-i.i.d data, our DACFL achieves 30â¼50% and 0â¼10% increases in average accuracy, compared to CDSGD and D-PSGD.
Assuntos
Aprendizado de Máquina , Privacidade , Comunicação , Consenso , AprendizagemRESUMO
The WashU Epigenome Browser (https://epigenomegateway.wustl.edu/) provides visualization, integration and analysis tools for epigenomic datasets. Since 2010, it has provided the scientific community with data from large consortia including the Roadmap Epigenomics and the ENCODE projects. Recently, we refactored the codebase, redesigned the user interface, and developed various novel features. New features include: (i) visualization using virtual reality (VR), which has implications in biology education and the study of 3D chromatin structure; (ii) expanded public data hubs, including data from the 4DN, ENCODE, Roadmap Epigenomics, TaRGET, IHEC and TCGA consortia; (iii) a more responsive user interface; (iv) a history of interactions, which enables undo and redo; (v) a feature we call Live Browsing, which allows multiple users to collaborate remotely on the same session; (vi) the ability to visualize local tracks and data hubs. Amazon Web Services also hosts the redesign at https://epigenomegateway.org/.
Assuntos
Bases de Dados Genéticas , Epigenoma/genética , Software , Navegador , Conjuntos de Dados como Assunto , Epigenômica , Genoma Humano , Humanos , Internet , Interface Usuário-ComputadorRESUMO
In the analysis of next-generation sequencing technology, massive discrete data are generated from short read counts with varying biological coverage. Conducting conditional hypothesis testing such as Fisher's Exact Test at every genomic region of interest thus leads to a heterogeneous multiple discrete testing problem. However, most existing multiple testing procedures for controlling the false discovery rate (FDR) assume that test statistics are continuous and become conservative for discrete tests. To overcome the conservativeness, in this article, we propose a novel multiple testing procedure for better FDR control on heterogeneous discrete tests. Our procedure makes decisions based on the marginal critical function (MCF) of randomized tests, which enables achieving a powerful and non-randomized multiple testing procedure. We provide upper bounds of the positive FDR (pFDR) and the positive false non-discovery rate (pFNR) corresponding to our procedure. We also prove that the set of detections made by our method contains every detection made by a naive application of the widely-used q-value method. We further demonstrate the improvement of our method over other existing multiple testing procedures by simulations and a real example of differentially methylated region (DMR) detection using whole-genome bisulfite sequencing (WGBS) data.
Assuntos
Biometria/métodos , Modelos Estatísticos , Distribuição Aleatória , Simulação por Computador , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SulfitosRESUMO
BACKGROUND: While the genetics of obesity has been well defined, the epigenetics of obesity is poorly understood. Here, we used a genome-wide approach to identify genes with differences in both DNA methylation and expression associated with a high-fat diet in mice. RESULTS: We weaned genetically identical Small (SM/J) mice onto a high-fat or low-fat diet and measured their weights weekly, tested their glucose and insulin tolerance, assessed serum biomarkers, and weighed their organs at necropsy. We measured liver gene expression with RNA-seq (using 21 total libraries, each pooled with 2 mice of the same sex and diet) and DNA methylation with MRE-seq and MeDIP-seq (using 8 total libraries, each pooled with 4 mice of the same sex and diet). There were 4356 genes with expression differences associated with diet, with 184 genes exhibiting a sex-by-diet interaction. Dietary fat dysregulated several pathways, including those involved in cytokine-cytokine receptor interaction, chemokine signaling, and oxidative phosphorylation. Over 7000 genes had differentially methylated regions associated with diet, which occurred in regulatory regions more often than expected by chance. Only 5-10% of differentially methylated regions occurred in differentially expressed genes, however this was more often than expected by chance (p = 2.2 × 10- 8). CONCLUSIONS: Discovering the gene expression and methylation changes associated with a high-fat diet can help to identify new targets for epigenetic therapies and inform about the physiological changes in obesity. Here, we identified numerous genes with altered expression and methylation that are promising candidates for further study.
Assuntos
Metilação de DNA/genética , Dieta Hiperlipídica , Regulação da Expressão Gênica , Genoma , Animais , Glicemia/metabolismo , Peso Corporal/genética , Colesterol/sangue , Feminino , Estudos de Associação Genética , Teste de Tolerância a Glucose , Insulina/sangue , Resistência à Insulina , Leptina/sangue , Masculino , Camundongos , Obesidade/sangue , Obesidade/genética , Triglicerídeos/sangueRESUMO
Thyroid hormone binds to nuclear receptors and regulates gene transcription. Here we report that in mice, at around the first day of life, there is a transient surge in hepatocyte type 2 deiodinase (D2) that activates the prohormone thyroxine to the active hormone triiodothyronine, modifying the expression of â¼165 genes involved in broad aspects of hepatocyte function, including lipid metabolism. Hepatocyte-specific D2 inactivation (ALB-D2KO) is followed by a delay in neonatal expression of key lipid-related genes and a persistent reduction in peroxisome proliferator-activated receptor-γ expression. Notably, the absence of a neonatal D2 peak significantly modifies the baseline and long-term hepatic transcriptional response to a high-fat diet (HFD). Overall, changes in the expression of approximately 400 genes represent the HFD response in control animals toward the synthesis of fatty acids and triglycerides, whereas in ALB-D2KO animals, the response is limited to a very different set of only approximately 200 genes associated with reverse cholesterol transport and lipase activity. A whole genome methylation profile coupled to multiple analytical platforms indicate that 10-20% of these differences can be related to the presence of differentially methylated local regions mapped to sites of active/suppressed chromatin, thus qualifying as epigenetic modifications occurring as a result of neonatal D2 inactivation. The resulting phenotype of the adult ALB-D2KO mouse is dramatic, with greatly reduced susceptibility to diet-induced steatosis, hypertriglyceridemia, and obesity.
Assuntos
Suscetibilidade a Doenças/enzimologia , Fígado Gorduroso/enzimologia , Regulação da Expressão Gênica no Desenvolvimento/genética , Hepatócitos/metabolismo , Iodeto Peroxidase/metabolismo , Obesidade/enzimologia , Análise de Variância , Animais , Animais Recém-Nascidos , Calorimetria Indireta , Metilação de DNA , Dieta Hiperlipídica/efeitos adversos , Fígado Gorduroso/etiologia , Perfilação da Expressão Gênica , Hibridização In Situ , Camundongos , Camundongos Knockout , Análise em Microsséries , Obesidade/etiologia , Tri-Iodotironina/sangueRESUMO
BACKGROUND: Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. RESULTS: Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. CONCLUSIONS: Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.
Assuntos
Sequência Conservada , Metilação de DNA/genética , Animais , Sítios de Ligação , Epigenômica , Evolução Molecular , Humanos , Camundongos , Especificidade de Órgãos , Ratos , Fatores de Transcrição/metabolismoRESUMO
Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.