Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
bioRxiv ; 2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39314347

RESUMO

CCR4-NOT regulates multiple steps in gene regulation, including transcription, mRNA decay, protein ubiquitylation, and translation. It has been well studied in budding yeast; however, relatively less is known about its regulation and functions in mammals. To characterize the functions of the human CCR4-NOT complex, we developed a rapid auxin-induced degron system to deplete CNOT1 (the scaffold of the complex) and CNOT4 (E3 ubiquitin ligase) in cell culture. Transcriptome-wide measurements of gene-expression revealed that depleting CNOT1 changed several thousand transcripts, wherein most mRNAs were increased and resulted in a global decrease in mRNA decay rates. In contrast to what was observed in CNOT1-depleted cells, CNOT4 depletion only modestly changed RNA steady-state levels and, surprisingly, led to a global acceleration in mRNA decay. To further investigate the role of CCR4-NOT in transcription, we used transient transcriptome sequencing (TT-seq) to measure ongoing RNA synthesis. Depletion of either subunit resulted in increased RNA synthesis of several thousand genes. In contrast to most of the genome, a rapid reduction in the synthesis of KRAB-Zinc-Finger-proteins (KZNFs) genes, especially those clustered on chromosome 19, was observed. KZNFs are transcriptional repressors of retro-transposable elements (rTEs), and consistent with the decreased KZNFs expression, we observed a significant and rapid activation of rTEs, mainly Long interspersed Nuclear Elements (LINEs). Our data reveal that CCR4-NOT regulates gene expression and silences retrotransposons across the genome by maintaining KZNF expression. These data establish CCR4-NOT as a global regulator of gene expression, and we have identified a novel mammalian-specific function of the complex, the suppression of rTEs.

2.
Nucleic Acids Res ; 52(17): 10161-10179, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-38966997

RESUMO

Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.


Assuntos
Cromatina , Motivos de Nucleotídeos , Plasmodium falciparum , Ligação Proteica , Proteínas de Protozoários , Fatores de Transcrição , Plasmodium falciparum/genética , Plasmodium falciparum/metabolismo , Cromatina/metabolismo , Cromatina/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Humanos , Proteínas de Protozoários/metabolismo , Proteínas de Protozoários/genética , Proteínas de Protozoários/química , Malária Falciparum/parasitologia , Sequência de Bases , DNA/metabolismo , DNA/química , Epigênese Genética , DNA de Protozoário/metabolismo , DNA de Protozoário/genética
3.
Mol Cell ; 84(15): 2838-2855.e10, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39019045

RESUMO

Despite the unique ability of pioneer factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO) to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1, in human A549 cells. Combining ChIP-ISO with in vitro and neural network analyses, we find that (1) FOXA1 binding is strongly affected by co-binding transcription factors (TFs) AP-1 and CEBPB; (2) FOXA1 and AP-1 show binding cooperativity in vitro; (3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin; and (4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.


Assuntos
Fator 3-alfa Nuclear de Hepatócito , Ligação Proteica , Fator de Transcrição AP-1 , Fator 3-alfa Nuclear de Hepatócito/metabolismo , Fator 3-alfa Nuclear de Hepatócito/genética , Humanos , Fator de Transcrição AP-1/metabolismo , Fator de Transcrição AP-1/genética , Sítios de Ligação , Células A549 , Cromatina/metabolismo , Cromatina/genética , Imunoprecipitação da Cromatina , Oligonucleotídeos/metabolismo , Oligonucleotídeos/genética
4.
Genome Res ; 34(7): 1089-1105, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-38951027

RESUMO

Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.


Assuntos
Epigênese Genética , Epigenoma , Especificidade da Espécie , Animais , Camundongos , Humanos , Células Sanguíneas/metabolismo , Sequências Reguladoras de Ácido Nucleico , Regulação da Expressão Gênica , Epigenômica/métodos
5.
Genome Res ; 34(6): 937-951, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-38986578

RESUMO

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multimapped" reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Humanos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Sequências Reguladoras de Ácido Nucleico , Sequências Repetitivas de Ácido Nucleico , Genômica/métodos , Sítios de Ligação , Fator de Ligação a CCCTC/metabolismo , Fator de Ligação a CCCTC/genética , Elementos Reguladores de Transcrição , Elementos de DNA Transponíveis , Análise de Sequência de DNA/métodos , Redes Neurais de Computação
6.
Genome Res ; 2024 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-38886069

RESUMO

Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mononucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147 bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages potentially prevent investigators from understanding the regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the size-based expectation maximization (SEM) nucleosome-calling package. SEM employs a hierarchical Gaussian mixture model to estimate nucleosome positions and subtypes. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragments. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Applying SEM to a low-dose MNase-H2B-ChIP-seq data set from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. Short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and are enriched at transcription start sites (TSSs) and CTCF peaks, similar to previously reported "fragile nucleosomes." These SEM-defined accessible short-fragment nucleosomes are found not just in promoters but also in distal regulatory regions. Additional analyses reveal their colocalization with the chromatin remodelers CHD6, CHD8, and EP400. In summary, SEM provides an effective platform for exploration of nonstandard nucleosome subtypes.

7.
bioRxiv ; 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-37066352

RESUMO

Knowledge of locations and activities of cis -regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our V al i dated S ystematic I ntegrati on (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.

8.
bioRxiv ; 2023 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-37986839

RESUMO

Despite the unique ability of pioneer transcription factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called ChIP-ISO to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1. Combining ChIP-ISO with in vitro and neural network analyses, we find that 1) FOXA1 binding is strongly affected by co-binding TFs AP-1 and CEBPB, 2) FOXA1 and AP-1 show binding cooperativity in vitro, 3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin, and 4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.

9.
bioRxiv ; 2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37904910

RESUMO

Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mono-nucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages ignore variation in nucleosome lengths, potentially blinding investigators to regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the Size-based Expectation Maximization (SEM) nucleosome calling package. SEM employs a hierarchical Gaussian mixture model to estimate the positions and subtype identity of nucleosomes from MNase-seq fragments. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragment lengths at nucleosome positions. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Using SEM on a low-dose MNase H2B MNase-ChIP-seq dataset from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. The short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Interestingly, the subset of short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and display distribution patterns around transcription start sites (TSSs) and CTCF peaks, similar to the previously reported "fragile nucleosomes". These SEM-defined accessible short-fragment nucleosomes are found not just in promoters, but also in enhancers and other regulatory regions. Additional investigations reveal their co-localization with the chromatin remodelers Chd6, Chd8, and Ep400. In summary, SEM provides an effective platform for distinguishing various nucleosome subtypes, paving the way for future exploration of non-standard nucleosomes.

10.
bioRxiv ; 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37873361

RESUMO

The DNA-binding activities of transcription factors (TFs) are influenced by both intrinsic sequence preferences and extrinsic interactions with cell-specific chromatin landscapes and other regulatory proteins. Disentangling the roles of these binding determinants remains challenging. For example, the FoxA subfamily of Forkhead domain (Fox) TFs are known pioneer factors that can bind to relatively inaccessible sites during development. Yet FoxA TF binding also varies across cell types, pointing to a combination of intrinsic and extrinsic forces guiding their binding. While other Forkhead domain TFs are often assumed to have pioneering abilities, how sequence and chromatin features influence the binding of related Fox TFs has not been systematically characterized. Here, we present a principled approach to compare the relative contributions of intrinsic DNA sequence preference and cell-specific chromatin environments to a TF's DNA-binding activities. We apply our approach to investigate how a selection of Fox TFs (FoxA1, FoxC1, FoxG1, FoxL2, and FoxP3) vary in their binding specificity. We over-express the selected Fox TFs in mouse embryonic stem cells, which offer a platform to contrast each TF's binding activity within the same preexisting chromatin background. By applying a convolutional neural network to interpret the Fox TF binding patterns, we evaluate how sequence and preexisting chromatin features jointly contribute to induced TF binding. We demonstrate that Fox TFs bind different DNA targets, and drive differential gene expression patterns, even when induced in identical chromatin settings. Despite the association between Forkhead domains and pioneering activities, the selected Fox TFs display a wide range of affinities for preexiting chromatin states. Using sequence and chromatin feature attribution techniques to interpret the neural network predictions, we show that differential sequence preferences combined with differential abilities to engage relatively inaccessible chromatin together explain Fox TF binding patterns at individual sites and genome-wide.

11.
bioRxiv ; 2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37745557

RESUMO

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multi-mapped" reads that align equally well to multiple genomic locations. Since multi-mapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq datasets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly effective in identifying ChIP-seq peaks in younger TEs, which hold evolutionary significance due to their emergence during human evolution from primates.

12.
Genome Biol ; 24(1): 79, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-37072822

RESUMO

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.


Assuntos
Algoritmos , Epigenômica , Genômica/métodos
13.
Front Neurosci ; 16: 903881, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35801179

RESUMO

Neuronal programming by forced expression of transcription factors (TFs) holds promise for clinical applications of regenerative medicine. However, the mechanisms by which TFs coordinate their activities on the genome and control distinct neuronal fates remain obscure. Using direct neuronal programming of embryonic stem cells, we dissected the contribution of a series of TFs to specific neuronal regulatory programs. We deconstructed the Ascl1-Lmx1b-Foxa2-Pet1 TF combination that has been shown to generate serotonergic neurons and found that stepwise addition of TFs to Ascl1 canalizes the neuronal fate into a diffuse monoaminergic fate. The addition of pioneer factor Foxa2 represses Phox2b to induce serotonergic fate, similar to in vivo regulatory networks. Foxa2 and Pet1 appear to act synergistically to upregulate serotonergic fate. Foxa2 and Pet1 co-bind to a small fraction of genomic regions but mostly bind to different regulatory sites. In contrast to the combinatorial binding activities of other programming TFs, Pet1 does not strictly follow the Foxa2 pioneer. These findings highlight the challenges in formulating generalizable rules for describing the behavior of TF combinations that program distinct neuronal subtypes.

14.
Science ; 377(6601): eabk2820, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35771912

RESUMO

Precise Hox gene expression is crucial for embryonic patterning. Intra-Hox transcription factor binding and distal enhancer elements have emerged as the major regulatory modules controlling Hox gene expression. However, quantifying their relative contributions has remained elusive. Here, we introduce "synthetic regulatory reconstitution," a conceptual framework for studying gene regulation, and apply it to the HoxA cluster. We synthesized and delivered variant rat HoxA clusters (130 to 170 kilobases) to an ectopic location in the mouse genome. We found that a minimal HoxA cluster recapitulated correct patterns of chromatin remodeling and transcription in response to patterning signals, whereas the addition of distal enhancers was needed for full transcriptional output. Synthetic regulatory reconstitution could provide a generalizable strategy for deciphering the regulatory logic of gene expression in complex genomes.


Assuntos
Padronização Corporal , Regulação da Expressão Gênica no Desenvolvimento , Genes Homeobox , Proteínas de Homeodomínio , Animais , Padronização Corporal/genética , Elementos Facilitadores Genéticos , Genoma , Proteínas de Homeodomínio/genética , Camundongos , Ratos , Transcrição Gênica
15.
Genome Biol ; 23(1): 99, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35440038

RESUMO

Reproducibility is a significant challenge in (epi)genomic research due to the complexity of experiments composed of traditional biochemistry and informatics. Recent advances have exacerbated this as high-throughput sequencing data is generated at an unprecedented pace. Here, we report the development of a Platform for Epi-Genomic Research (PEGR), a web-based project management platform that tracks and quality controls experiments from conception to publication-ready figures, compatible with multiple assays and bioinformatic pipelines. It supports rigor and reproducibility for biochemists working at the bench, while fully supporting reproducibility and reliability for bioinformaticians through integration with the Galaxy platform.


Assuntos
Epigenômica , Genômica , Biologia Computacional , Genoma , Reprodutibilidade dos Testes , Software
16.
Cell Rep ; 38(11): 110524, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35294876

RESUMO

In pluripotent cells, a delicate activation-repression balance maintains pro-differentiation genes ready for rapid activation. The identity of transcription factors (TFs) that specifically repress pro-differentiation genes remains obscure. By targeting ∼1,700 TFs with CRISPR loss-of-function screen, we found that ZBTB11 and ZFP131 are required for embryonic stem cell (ESC) pluripotency. ESCs without ZBTB11 or ZFP131 lose colony morphology, reduce proliferation rate, and upregulate transcription of genes associated with three germ layers. ZBTB11 and ZFP131 bind proximally to pro-differentiation genes. ZBTB11 or ZFP131 loss leads to an increase in H3K4me3, negative elongation factor (NELF) complex release, and concomitant transcription at associated genes. Together, our results suggest that ZBTB11 and ZFP131 maintain pluripotency by preventing premature expression of pro-differentiation genes and present a generalizable framework to maintain cellular potency.


Assuntos
Células-Tronco Embrionárias , Células-Tronco Pluripotentes , Animais , Humanos , Camundongos , Diferenciação Celular/genética , Sistemas CRISPR-Cas , Células-Tronco Embrionárias/metabolismo , Camadas Germinativas/metabolismo , Células-Tronco Pluripotentes/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
Genome Res ; 32(3): 512-523, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35042722

RESUMO

The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.


Assuntos
Redes Neurais de Computação , Fatores de Transcrição , Sítios de Ligação , Sequenciamento de Cromatina por Imunoprecipitação , Biologia Computacional/métodos , Ligação Proteica , Fatores de Transcrição/metabolismo
18.
Nature ; 592(7853): 309-314, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33692541

RESUMO

The genome-wide architecture of chromatin-associated proteins that maintains chromosome integrity and gene regulation is not well defined. Here we use chromatin immunoprecipitation, exonuclease digestion and DNA sequencing (ChIP-exo/seq)1,2 to define this architecture in Saccharomyces cerevisiae. We identify 21 meta-assemblages consisting of roughly 400 different proteins that are related to DNA replication, centromeres, subtelomeres, transposons and transcription by RNA polymerase (Pol) I, II and III. Replication proteins engulf a nucleosome, centromeres lack a nucleosome, and repressive proteins encompass three nucleosomes at subtelomeric X-elements. We find that most promoters associated with Pol II evolved to lack a regulatory region, having only a core promoter. These constitutive promoters comprise a short nucleosome-free region (NFR) adjacent to a +1 nucleosome, which together bind the transcription-initiation factor TFIID to form a preinitiation complex. Positioned insulators protect core promoters from upstream events. A small fraction of promoters evolved an architecture for inducibility, whereby sequence-specific transcription factors (ssTFs) create a nucleosome-depleted region (NDR) that is distinct from an NFR. We describe structural interactions among ssTFs, their cognate cofactors and the genome. These interactions include the nucleosomal and transcriptional regulators RPD3-L, SAGA, NuA4, Tup1, Mediator and SWI-SNF. Surprisingly, we do not detect interactions between ssTFs and TFIID, suggesting that such interactions do not stably occur. Our model for gene induction involves ssTFs, cofactors and general factors such as TBP and TFIIB, but not TFIID. By contrast, constitutive transcription involves TFIID but not ssTFs engaged with their cofactors. From this, we define a highly integrated network of gene regulation by ssTFs.


Assuntos
Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Genoma Fúngico/genética , Complexos Multiproteicos/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/genética , Coenzimas/metabolismo , Complexos Multiproteicos/metabolismo , Regiões Promotoras Genéticas , RNA Polimerase I/metabolismo , RNA Polimerase II/metabolismo , RNA Polimerase III/metabolismo , Proteína de Ligação a TATA-Box/genética , Proteína de Ligação a TATA-Box/metabolismo , Fator de Transcrição TFIIB/genética , Fator de Transcrição TFIIB/metabolismo , Fator de Transcrição TFIID , Fatores de Transcrição/metabolismo
19.
Bioinformatics ; 37(18): 3011-3013, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33681991

RESUMO

SUMMARY: Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. AVAILABILITY AND IMPLEMENTATION: S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Epigenômica , Software , Epigenômica/métodos , Epigênese Genética , Regulação da Expressão Gênica , Hematopoese
20.
Genome Biol ; 22(1): 20, 2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33413545

RESUMO

BACKGROUND: Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor's DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. RESULTS: Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. CONCLUSIONS: Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.


Assuntos
Cromatina , Redes Neurais de Computação , Ligação Proteica/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Sítios de Ligação/genética , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica , Genoma , Histonas/metabolismo , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA