RESUMEN
Despite the unique ability of pioneer factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO) to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1, in human A549 cells. Combining ChIP-ISO with in vitro and neural network analyses, we find that (1) FOXA1 binding is strongly affected by co-binding transcription factors (TFs) AP-1 and CEBPB; (2) FOXA1 and AP-1 show binding cooperativity in vitro; (3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin; and (4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.
Asunto(s)
Factor Nuclear 3-alfa del Hepatocito , Unión Proteica , Factor de Transcripción AP-1 , Factor Nuclear 3-alfa del Hepatocito/metabolismo , Factor Nuclear 3-alfa del Hepatocito/genética , Humanos , Factor de Transcripción AP-1/metabolismo , Factor de Transcripción AP-1/genética , Sitios de Unión , Células A549 , Cromatina/metabolismo , Cromatina/genética , Inmunoprecipitación de Cromatina , Oligonucleótidos/metabolismo , Oligonucleótidos/genéticaRESUMEN
Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mononucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147 bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages potentially prevent investigators from understanding the regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the size-based expectation maximization (SEM) nucleosome-calling package. SEM employs a hierarchical Gaussian mixture model to estimate nucleosome positions and subtypes. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragments. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Applying SEM to a low-dose MNase-H2B-ChIP-seq data set from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. Short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and are enriched at transcription start sites (TSSs) and CTCF peaks, similar to previously reported "fragile nucleosomes." These SEM-defined accessible short-fragment nucleosomes are found not just in promoters but also in distal regulatory regions. Additional analyses reveal their colocalization with the chromatin remodelers CHD6, CHD8, and EP400. In summary, SEM provides an effective platform for exploration of nonstandard nucleosome subtypes.
Asunto(s)
Nucleosomas , Nucleosomas/genética , Nucleosomas/metabolismo , Ratones , Animales , Histonas/metabolismo , Histonas/genética , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Células Madre Embrionarias de Ratones/metabolismoRESUMEN
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multimapped" reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.
Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Humanos , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Secuencias Repetitivas de Ácidos Nucleicos , Genómica/métodos , Sitios de Unión , Factor de Unión a CCCTC/metabolismo , Factor de Unión a CCCTC/genética , Elementos Reguladores de la Transcripción , Elementos Transponibles de ADN , Análisis de Secuencia de ADN/métodos , Redes Neurales de la ComputaciónRESUMEN
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Asunto(s)
Epigénesis Genética , Epigenoma , Especificidad de la Especie , Animales , Ratones , Humanos , Células Sanguíneas/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Regulación de la Expresión Génica , Epigenómica/métodosRESUMEN
The genome-wide architecture of chromatin-associated proteins that maintains chromosome integrity and gene regulation is not well defined. Here we use chromatin immunoprecipitation, exonuclease digestion and DNA sequencing (ChIP-exo/seq)1,2 to define this architecture in Saccharomyces cerevisiae. We identify 21 meta-assemblages consisting of roughly 400 different proteins that are related to DNA replication, centromeres, subtelomeres, transposons and transcription by RNA polymerase (Pol) I, II and III. Replication proteins engulf a nucleosome, centromeres lack a nucleosome, and repressive proteins encompass three nucleosomes at subtelomeric X-elements. We find that most promoters associated with Pol II evolved to lack a regulatory region, having only a core promoter. These constitutive promoters comprise a short nucleosome-free region (NFR) adjacent to a +1 nucleosome, which together bind the transcription-initiation factor TFIID to form a preinitiation complex. Positioned insulators protect core promoters from upstream events. A small fraction of promoters evolved an architecture for inducibility, whereby sequence-specific transcription factors (ssTFs) create a nucleosome-depleted region (NDR) that is distinct from an NFR. We describe structural interactions among ssTFs, their cognate cofactors and the genome. These interactions include the nucleosomal and transcriptional regulators RPD3-L, SAGA, NuA4, Tup1, Mediator and SWI-SNF. Surprisingly, we do not detect interactions between ssTFs and TFIID, suggesting that such interactions do not stably occur. Our model for gene induction involves ssTFs, cofactors and general factors such as TBP and TFIIB, but not TFIID. By contrast, constitutive transcription involves TFIID but not ssTFs engaged with their cofactors. From this, we define a highly integrated network of gene regulation by ssTFs.
Asunto(s)
Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Genoma Fúngico/genética , Complejos Multiproteicos/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Coenzimas/metabolismo , Complejos Multiproteicos/metabolismo , Regiones Promotoras Genéticas , ARN Polimerasa I/metabolismo , ARN Polimerasa II/metabolismo , ARN Polimerasa III/metabolismo , Proteína de Unión a TATA-Box/genética , Proteína de Unión a TATA-Box/metabolismo , Factor de Transcripción TFIIB/genética , Factor de Transcripción TFIIB/metabolismo , Factor de Transcripción TFIID , Factores de Transcripción/metabolismoRESUMEN
Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.
Asunto(s)
Cromatina , Motivos de Nucleótidos , Plasmodium falciparum , Unión Proteica , Proteínas Protozoarias , Factores de Transcripción , Plasmodium falciparum/genética , Plasmodium falciparum/metabolismo , Cromatina/metabolismo , Cromatina/genética , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Sitios de Unión , Humanos , Proteínas Protozoarias/metabolismo , Proteínas Protozoarias/genética , Proteínas Protozoarias/química , Malaria Falciparum/parasitología , Secuencia de Bases , ADN/metabolismo , ADN/química , Epigénesis Genética , ADN Protozoario/metabolismo , ADN Protozoario/genéticaRESUMEN
The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.
Asunto(s)
Redes Neurales de la Computación , Factores de Transcripción , Sitios de Unión , Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Unión Proteica , Factores de Transcripción/metabolismoRESUMEN
Nuclear DNA wraps around core histones to form nucleosomes, which restricts the binding of transcription factors to gene regulatory sequences. Pioneer transcription factors can bind DNA sites on nucleosomes and initiate gene regulatory events, often leading to the local opening of chromatin. However, the nucleosomal configuration of open chromatin and the basis for its regulation is unclear. We combined low and high levels of micrococcal nuclease (MNase) digestion along with core histone mapping to assess the nucleosomal configuration at enhancers and promoters in mouse liver. We find that MNase-accessible nucleosomes, bound by transcription factors, are retained more at liver-specific enhancers than at promoters and ubiquitous enhancers. The pioneer factor FoxA displaces linker histone H1, thereby keeping enhancer nucleosomes accessible in chromatin and allowing other liver-specific transcription factors to bind and stimulate transcription. Thus, nucleosomes are not exclusively repressive to gene regulation when they are retained with, and exposed by, pioneer factors.
Asunto(s)
Elementos de Facilitación Genéticos , Factor Nuclear 3-alfa del Hepatocito/metabolismo , Factor Nuclear 3-beta del Hepatocito/metabolismo , Factor Nuclear 3-gamma del Hepatocito/metabolismo , Nucleosomas/metabolismo , Animales , Histonas/metabolismo , Hígado/metabolismo , Ratones , Nucleosomas/genética , Especificidad de Órganos , Regiones Promotoras Genéticas , Transcripción GenéticaRESUMEN
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.
Asunto(s)
Epigénesis Genética , Hematopoyesis/genética , Células Madre Hematopoyéticas/metabolismo , Animales , Ratones , Elementos Reguladores de la Transcripción , TranscriptomaRESUMEN
Although Hox genes encode for conserved transcription factors (TFs), they are further divided into anterior, central and posterior groups based on their DNA-binding domain similarity. The posterior Hox group expanded in the deuterostome clade and patterns caudal and distal structures. We aimed to address how similar Hox TFs diverge to induce different positional identities. We studied Hox TF DNA-binding and regulatory activity during an in vitro motor neuron differentiation system that recapitulates embryonic development. We found diversity in the genomic binding profiles of different Hox TFs, even among the posterior group paralogs that share similar DNA-binding domains. These differences in genomic binding were explained by differing abilities to bind to previously inaccessible sites. For example, the posterior group HOXC9 had a greater ability to bind occluded sites than the posterior HOXC10, producing different binding patterns and driving differential gene expression programs. From these results, we propose that the differential abilities of posterior Hox TFs to bind to previously inaccessible chromatin drive patterning diversification.This article has an associated 'The people behind the papers' interview.
Asunto(s)
Diferenciación Celular , Cromatina/metabolismo , Desarrollo Embrionario , Regulación del Desarrollo de la Expresión Génica , Proteínas de Homeodominio/metabolismo , Neuronas Motoras/metabolismo , Factores de Transcripción/metabolismo , Animales , Línea Celular , Cromatina/genética , Proteínas de Homeodominio/genética , Ratones , Neuronas Motoras/citología , Factores de Transcripción/genéticaRESUMEN
Differentiation from asexual blood stages to mature sexual gametocytes is required for the transmission of malaria parasites. Here, we report that the ApiAP2 transcription factor, PfAP2-G2 (PF3D7_1408200) plays a critical role in the maturation of Plasmodium falciparum gametocytes. PfAP2-G2 binds to the promoters of a wide array of genes that are expressed at many stages of the parasite life cycle. Interestingly, we also find binding of PfAP2-G2 within the gene body of almost 3,000 genes, which strongly correlates with the location of H3K36me3 and several other histone modifications as well as Heterochromatin Protein 1 (HP1), suggesting that occupancy of PfAP2-G2 in gene bodies may serve as an alternative regulatory mechanism. Disruption of pfap2-g2 does not impact asexual development, but the majority of sexual parasites are unable to mature beyond stage III gametocytes. The absence of pfap2-g2 leads to overexpression of 28% of the genes bound by PfAP2-G2 and none of the PfAP2-G2 bound genes are downregulated, suggesting that it is a repressor. We also find that PfAP2-G2 interacts with chromatin remodeling proteins, a microrchidia (MORC) protein, and another ApiAP2 protein (PF3D7_1139300). Overall our data demonstrate that PfAP2-G2 establishes an essential gametocyte maturation program in association with other chromatin-related proteins.
Asunto(s)
Células Germinativas/crecimiento & desarrollo , Malaria Falciparum/parasitología , Plasmodium falciparum/crecimiento & desarrollo , Plasmodium falciparum/metabolismo , Proteínas Protozoarias/metabolismo , Factores de Transcripción/metabolismo , Gametogénesis , Regulación del Desarrollo de la Expresión Génica , Células Germinativas/metabolismo , Humanos , Estadios del Ciclo de Vida , Plasmodium falciparum/genética , Proteínas Protozoarias/genética , Factores de Transcripción/genéticaRESUMEN
SUMMARY: Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. AVAILABILITY AND IMPLEMENTATION: S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Epigenómica , Programas Informáticos , Epigenómica/métodos , Epigénesis Genética , Regulación de la Expresión Génica , HematopoyesisRESUMEN
Few existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using "heatmap" matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space. We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.
Asunto(s)
Algoritmos , Cromatina/metabolismo , Epigenómica/métodos , Redes Reguladoras de Genes , Línea Celular , Secuenciación de Inmunoprecipitación de Cromatina , Mapeo Cromosómico , Humanos , Programas InformáticosRESUMEN
The ChIP-exo assay precisely delineates protein-DNA crosslinking patterns by combining chromatin immunoprecipitation with 5' to 3' exonuclease digestion. Within a regulatory complex, the physical distance of a regulatory protein to DNA affects crosslinking efficiencies. Therefore, the spatial organization of a protein-DNA complex could potentially be inferred by analyzing how crosslinking signatures vary between its subunits. Here, we present a computational framework that aligns ChIP-exo crosslinking patterns from multiple proteins across a set of coordinately bound regulatory regions, and which detects and quantifies protein-DNA crosslinking events within the aligned profiles. By producing consistent measurements of protein-DNA crosslinking strengths across multiple proteins, our approach enables characterization of relative spatial organization within a regulatory complex. Applying our approach to collections of ChIP-exo data, we demonstrate that it can recover aspects of regulatory complex spatial organization at yeast ribosomal protein genes and yeast tRNA genes. We also demonstrate the ability to quantify changes in protein-DNA complex organization across conditions by applying our approach to analyze Drosophila Pol II transcriptional components. Our results suggest that principled analyses of ChIP-exo crosslinking patterns enable inference of spatial organization within protein-DNA complexes.
Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Proteínas de Unión al ADN/metabolismo , Exonucleasas/química , ARN de Transferencia/genética , Proteínas Ribosómicas/genética , Alineación de Secuencia/métodos , Factores de Transcripción/metabolismo , Algoritmos , Animales , Sitios de Unión , Simulación por Computador , Proteínas de Unión al ADN/química , Bases de Datos Genéticas , Drosophila/química , Drosophila/genética , Drosophila/metabolismo , Regiones Promotoras Genéticas , Unión Proteica , ARN Polimerasa II/química , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , ARN Polimerasa III/química , ARN Polimerasa III/genética , ARN Polimerasa III/metabolismo , ARN de Transferencia/química , ARN de Transferencia/metabolismo , Proteínas Ribosómicas/química , Proteínas Ribosómicas/metabolismo , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN/métodos , Factor de Transcripción TFIIIB/química , Factor de Transcripción TFIIIB/genética , Factor de Transcripción TFIIIB/metabolismo , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción TFIII/química , Factores de Transcripción TFIII/genética , Factores de Transcripción TFIII/metabolismo , Sitio de Iniciación de la TranscripciónRESUMEN
Gene expression is controlled by a variety of proteins that interact with the genome. Their precise organization and mechanism of action at every promoter remains to be worked out. To better understand the physical interplay among genome-interacting proteins, we examined the temporal binding of a functionally diverse subset of these proteins: nucleosomes (H3), H2AZ (Htz1), SWR (Swr1), RSC (Rsc1, Rsc3, Rsc58, Rsc6, Rsc9, Sth1), SAGA (Spt3, Spt7, Ubp8, Sgf11), Hsf1, TFIID (Spt15/TBP and Taf1), TFIIB (Sua7), TFIIH (Ssl2), FACT (Spt16), Pol II (Rpb3), and Pol II carboxyl-terminal domain (CTD) phosphorylation at serines 2, 5, and 7. They were examined under normal and acute heat shock conditions, using the ultrahigh resolution genome-wide ChIP-exo assay in Saccharomyces cerevisiae Our findings reveal a precise positional organization of proteins bound at most genes, some of which rapidly reorganize within minutes of heat shock. This includes more precise positional transitions of Pol II CTD phosphorylation along the 5' ends of genes than previously seen. Reorganization upon heat shock includes colocalization of SAGA with promoter-bound Hsf1, a change in RSC subunit enrichment from gene bodies to promoters, and Pol II accumulation within promoter/+1 nucleosome regions. Most of these events are widespread and not necessarily coupled to changes in gene expression. Together, these findings reveal protein-genome interactions that are robustly reprogrammed in precise and uniform ways far beyond what is elicited by changes in gene expression.
RESUMEN
MOTIVATION: Regulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein-protein interactions with other regulators. Each recruitment mechanism may be associated with distinct motifs and may also result in distinct characteristic patterns in high-resolution protein-DNA binding assays. For example, the ChIP-exo protocol precisely characterizes protein-DNA crosslinking patterns by combining chromatin immunoprecipitation (ChIP) with 5' â 3' exonuclease digestion. Since different regulatory complexes will result in different protein-DNA crosslinking signatures, analysis of ChIP-exo tag enrichment patterns should enable detection of multiple protein-DNA binding modes for a given regulatory protein. However, current ChIP-exo analysis methods either treat all binding events as being of a uniform type or rely on motifs to cluster binding events into subtypes. RESULTS: To systematically detect multiple protein-DNA interaction modes in a single ChIP-exo experiment, we introduce the ChIP-exo mixture model (ChExMix). ChExMix probabilistically models the genomic locations and subtype memberships of binding events using both ChIP-exo tag distribution patterns and DNA motifs. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP-exo data. We further demonstrate the unique analysis abilities of ChExMix using a collection of ChIP-exo experiments that profile the binding of key transcription factors in MCF-7 cells. In these data, ChExMix identifies possible recruitment mechanisms of FoxA1 and ERα, thus demonstrating that ChExMix can effectively stratify ChIP-exo binding events into biologically meaningful subtypes. AVAILABILITY AND IMPLEMENTATION: ChExMix is available from https://github.com/seqcode/chexmix. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Sitios de Unión , Inmunoprecipitación de Cromatina , ADN , Motivos de Nucleótidos , Unión Proteica , Análisis de Secuencia de ADNRESUMEN
Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.
Asunto(s)
Cromatina/metabolismo , Epigenoma , Factores de Transcripción GATA/metabolismo , Regulación de la Expresión Génica , Hematopoyesis , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Animales , Diferenciación Celular , Cromatina/genética , Factores de Transcripción GATA/genética , HumanosRESUMEN
Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.
Asunto(s)
Predicción/métodos , Elementos Reguladores de la Transcripción/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Epigenómica , Redes Reguladoras de Genes/genética , Genoma Humano , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple , Secuencias Reguladoras de Ácidos Nucleicos , Secuenciación Completa del Genoma/métodosRESUMEN
MOTIVATION: Recent experiments have provided Hi-C data at resolution as high as 1 kbp. However, 3D structural inference from high-resolution Hi-C datasets is often computationally unfeasible using existing methods. RESULTS: We have developed miniMDS, an approximation of multidimensional scaling (MDS) that partitions a Hi-C dataset, performs high-resolution MDS separately on each partition, and then reassembles the partitions using low-resolution MDS. miniMDS is faster, more accurate, and uses less memory than existing methods for inferring the human genome at high resolution (10 kbp). AVAILABILITY AND IMPLEMENTATION: A Python implementation of miniMDS is available on GitHub: https://github.com/seqcode/miniMDS . CONTACT: mahony@psu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Cromosomas Humanos/metabolismo , Genoma Humano , Genómica/métodos , Modelos Moleculares , Programas Informáticos , Algoritmos , Humanos , Conformación Molecular , Análisis de Secuencia de ADN/métodosRESUMEN
Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.