RESUMEN
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Asunto(s)
Epigénesis Genética , Epigenoma , Especificidad de la Especie , Animales , Ratones , Humanos , Células Sanguíneas/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Regulación de la Expresión Génica , Epigenómica/métodosRESUMEN
Combinatorial patterns of epigenetic features reflect transcriptional states and functions of genomic regions. While many epigenetic features have correlated relationships, most existing data normalization approaches analyze each feature independently. Such strategies may distort relationships between functionally correlated epigenetic features and hinder biological interpretation. We present a novel approach named JMnorm that simultaneously normalizes multiple epigenetic features across cell types, species, and experimental conditions by leveraging information from partially correlated epigenetic features. We demonstrate that JMnorm-normalized data can better preserve cross-epigenetic-feature correlations across different cell types and enhance consistency between biological replicates than data normalized by other methods. Additionally, we show that JMnorm-normalized data can consistently improve the performance of various downstream analyses, which include candidate cis-regulatory element clustering, cross-cell-type gene expression prediction, detection of transcription factor binding and changes upon perturbations. These findings suggest that JMnorm effectively minimizes technical noise while preserving true biologically significant relationships between epigenetic datasets. We anticipate that JMnorm will enhance integrative and comparative epigenomics.
Asunto(s)
Biología Computacional , Epigenómica , Epigénesis Genética , Epigenómica/métodos , Genoma , Genómica/métodos , Unión Proteica , Biología Computacional/métodosRESUMEN
BACKGROUND: Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during cell differentiation. The scale and complexity of epigenetic data pose significant challenges for biologists to identify the regulatory events controlling cell differentiation. RESULTS: To reduce the complexity, we developed a package, called Snapshot, for clustering and visualizing candidate cis-regulatory elements (cCREs) based on their epigenetic signals during cell differentiation. This package first introduces a binarized indexing strategy for clustering the cCREs. It then provides a series of easily interpretable figures for visualizing the signal and epigenetic state patterns of the cCREs clusters during the cell differentiation. It can also use different hierarchies of cell types to highlight the epigenetic history specific to any particular cell lineage. We demonstrate the utility of Snapshot using data from a consortium project for ValIdated Systematic IntegratiON (VISION) of epigenomic data in hematopoiesis. CONCLUSION: The package Snapshot can identify all distinct clusters of genomic locations with unique epigenetic signal patterns during cell differentiation. It outperforms other methods in terms of interpreting and reproducing the identified cCREs clusters. The package of Snapshot is available at GitHub: https://github.com/guanjue/Snapshot .
Asunto(s)
Cromatina , Epigenómica , Epigenómica/métodos , Diferenciación Celular/genética , Epigénesis Genética , Análisis por ConglomeradosRESUMEN
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.
Asunto(s)
Epigénesis Genética , Hematopoyesis/genética , Células Madre Hematopoyéticas/metabolismo , Animales , Ratones , Elementos Reguladores de la Transcripción , TranscriptomaRESUMEN
SUMMARY: Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. AVAILABILITY AND IMPLEMENTATION: S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Epigenómica , Programas Informáticos , Epigenómica/métodos , Epigénesis Genética , Regulación de la Expresión Génica , HematopoyesisRESUMEN
Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.
Asunto(s)
Epigenómica/métodos , Análisis de Secuencia de ADN/métodos , Expresión Génica , Código de Histonas , RNA-Seq , Programas InformáticosRESUMEN
Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.
Asunto(s)
Cromatina/metabolismo , Epigenoma , Factores de Transcripción GATA/metabolismo , Regulación de la Expresión Génica , Hematopoyesis , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Animales , Diferenciación Celular , Cromatina/genética , Factores de Transcripción GATA/genética , HumanosRESUMEN
Knowledge of locations and activities of cis -regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our V al i dated S ystematic I ntegrati on (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
RESUMEN
Recent advances in single-cell RNA sequencing have shown heterogeneous cell types and gene expression states in the non-cancerous cells in tumors. The integration of multiple scRNA-seq datasets across tumors can indicate common cell types and states in the tumor microenvironment (TME). We develop a data driven framework, MetaTiME, to overcome the limitations in resolution and consistency that result from manual labelling using known gene markers. Using millions of TME single cells, MetaTiME learns meta-components that encode independent components of gene expression observed across cancer types. The meta-components are biologically interpretable as cell types, cell states, and signaling activities. By projecting onto the MetaTiME space, we provide a tool to annotate cell states and signature continuums for TME scRNA-seq data. Leveraging epigenetics data, MetaTiME reveals critical transcriptional regulators for the cell states. Overall, MetaTiME learns data-driven meta-components that depict cellular states and gene regulators for tumor immunity and cancer immunotherapy.
Asunto(s)
Epigénesis Genética , Microambiente Tumoral , Microambiente Tumoral/genética , Epigenómica , Inmunoterapia , Expresión Génica , Análisis de la Célula IndividualRESUMEN
Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.
Asunto(s)
Genoma , Genómica , Teorema de Bayes , Análisis por Conglomerados , Análisis de Secuencia de ADN/métodosRESUMEN
CCCTC-binding factor (CTCF) is a conserved zinc finger transcription factor implicated in a wide range of functions, including genome organization, transcription activation, and elongation. To explore the basis for CTCF functional diversity, we coupled an auxin-induced degron system with precision nuclear run-on. Unexpectedly, oriented CTCF motifs in gene bodies are associated with transcriptional stalling in a manner independent of bound CTCF. Moreover, CTCF at different binding sites (CBSs) displays highly variable resistance to degradation. Motif sequence does not significantly predict degradation behavior, but location at chromatin boundaries and chromatin loop anchors, as well as co-occupancy with cohesin, are associated with delayed degradation. Single-molecule tracking experiments link chromatin residence time to CTCF degradation kinetics, which has ramifications regarding architectural CTCF functions. Our study highlights the heterogeneity of CBSs, uncovers properties specific to architecturally important CBSs, and provides insights into the basic processes of genome organization and transcription regulation.
Asunto(s)
Factor de Unión a CCCTC/metabolismo , Secuenciación de Inmunoprecipitación de Cromatina , Cromatina/metabolismo , Eritroblastos/metabolismo , Imagen Individual de Molécula , Animales , Sitios de Unión , Factor de Unión a CCCTC/genética , Sistemas CRISPR-Cas , Línea Celular , Cromatina/genética , Ensamble y Desensamble de Cromatina , Edición Génica , Cinética , Ratones , Simulación de Dinámica Molecular , Unión Proteica , Proteolisis , ARN Polimerasa II/metabolismo , Transcripción GenéticaRESUMEN
The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.
Asunto(s)
Ensamble y Desensamble de Cromatina , Cromatina/metabolismo , Algoritmos , Epigénesis Genética , Genómica , Programas InformáticosRESUMEN
The gastrointestinal (GI) epithelium is a highly regenerative tissue with the potential to provide a renewable source of insulin(+) cells after undergoing cellular reprogramming. Here, we show that cells of the antral stomach have a previously unappreciated propensity for conversion into functional insulin-secreting cells. Native antral endocrine cells share a surprising degree of transcriptional similarity with pancreatic ß cells, and expression of ß cell reprogramming factors in vivo converts antral cells efficiently into insulin(+) cells with close molecular and functional similarity to ß cells. Induced GI insulin(+) cells can suppress hyperglycemia in a diabetic mouse model for at least 6 months and regenerate rapidly after ablation. Reprogramming of antral stomach cells assembled into bioengineered mini-organs in vitro yielded transplantable units that also suppressed hyperglycemia in diabetic mice, highlighting the potential for development of engineered stomach tissues as a renewable source of functional ß cells for glycemic control.