Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36702751

RESUMO

Recognizing binding sites of DNA-binding proteins is a key factor for elucidating transcriptional regulation in organisms. ChIP-exo enables researchers to delineate genome-wide binding landscapes of DNA-binding proteins with near single base-pair resolution. However, the peak calling step hinders ChIP-exo application since the published algorithms tend to generate false-positive and false-negative predictions. Here, we report the development of DEOCSU (DEep-learning Optimized ChIP-exo peak calling SUite), a novel machine learning-based ChIP-exo peak calling suite. DEOCSU entails the deep convolutional neural network model which was trained with curated ChIP-exo peak data to distinguish the visualized data of bona fide peaks from false ones. Performance validation of the trained deep-learning model indicated its high accuracy, high precision and high recall of over 95%. Applying the new suite to both in-house and publicly available ChIP-exo datasets obtained from bacteria, eukaryotes and archaea revealed an accurate prediction of peaks containing canonical motifs, highlighting the versatility and efficiency of DEOCSU. Furthermore, DEOCSU can be executed on a cloud computing platform or the local environment. With visualization software included in the suite, adjustable options such as the threshold of peak probability, and iterable updating of the pre-trained model, DEOCSU can be optimized for users' specific needs.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Aprendizado Profundo , Imunoprecipitação da Cromatina , Proteínas de Ligação a DNA/metabolismo , Software , Algoritmos , Sítios de Ligação , Análise de Sequência de DNA
2.
Proc Natl Acad Sci U S A ; 118(2)2021 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-33372147

RESUMO

A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in F1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in Escherichia coli K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.


Assuntos
Biologia Computacional/métodos , Previsões/métodos , Fatores de Transcrição/genética , Algoritmos , Sítios de Ligação/genética , Sequenciamento de Cromatina por Imunoprecipitação/métodos , DNA/genética , Proteínas de Ligação a DNA/genética , Aprendizado Profundo/tendências , Genoma/genética , Ligação Proteica/genética , Software
3.
BMC Bioinformatics ; 21(1): 65, 2020 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-32085702

RESUMO

BACKGROUND: ChIP (Chromatin immunoprecipitation)-exo has emerged as an important and versatile improvement over conventional ChIP-seq as it reduces the level of noise, maps the transcription factor (TF) binding location in a very precise manner, upto single base-pair resolution, and enables binding mode prediction. Availability of numerous peak-callers for analyzing ChIP-exo reads has motivated the need to assess their performance and report which tool executes reasonably well for the task. RESULTS: This study has focussed on comparing peak-callers that report direct binding events with those that report indirect binding events. The effect of strandedness of reads and duplication of data on the performance of peak-callers has been investigated. The number of peaks reported by each peak-caller is compared followed by a comparison of the annotated motifs present in the reported peaks. The significance of peaks is assessed based on the presence of a motif in top peaks. Indirect binding tools have been compared on the basis of their ability to identify annotated motifs and predict mode of protein-DNA interaction. CONCLUSION: By studying the output of the peak-callers investigated in this study, it is concluded that the tools that use self-learning algorithms, i.e. the tools that estimate all the essential parameters from the aligned reads, perform better than the algorithms which require formation of peak-pairs. The latest tools that account for indirect binding of TFs appear to be an upgrade over the available tools, as they are able to reveal valuable information about the mode of binding in addition to direct binding. Furthermore, the quality of ChIP-exo reads have important consequences on the output of data analysis.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , Confiabilidade dos Dados , Humanos
4.
EMBO J ; 34(4): 502-16, 2015 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-25535248

RESUMO

Human transcription factors recognize specific DNA sequence motifs to regulate transcription. It is unknown whether a single transcription factor is able to bind to distinctly different motifs on chromatin, and if so, what determines the usage of specific motifs. By using a motif-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) approach, we find that agonist-liganded human androgen receptor (AR) and antagonist-liganded AR bind to two distinctly different motifs, leading to distinct transcriptional outcomes in prostate cancer cells. Further analysis on clinical prostate tissues reveals that the binding of AR to these two distinct motifs is involved in prostate carcinogenesis. Together, these results suggest that unique ligands may switch DNA motifs recognized by ligand-dependent transcription factors in vivo. Our findings also provide a broad mechanistic foundation for understanding ligand-specific induction of gene expression profiles.


Assuntos
Antagonistas de Receptores de Andrógenos/química , Androgênios/química , DNA/metabolismo , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/metabolismo , Antagonistas de Receptores de Andrógenos/metabolismo , Androgênios/metabolismo , Proliferação de Células/fisiologia , Imunoprecipitação da Cromatina , Ensaio de Desvio de Mobilidade Eletroforética , Humanos , Masculino , Reação em Cadeia da Polimerase Via Transcriptase Reversa
5.
Crit Rev Biochem Mol Biol ; 50(4): 269-83, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26038153

RESUMO

Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.


Assuntos
Cromatina/metabolismo , Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Modelos Moleculares , Animais , Cromatina/química , Imunoprecipitação da Cromatina/tendências , Biologia Computacional/tendências , Simulação por Computador/tendências , DNA/química , Pegada de DNA/tendências , Proteínas de Ligação a DNA/química , Conjuntos de Dados como Assunto , Exodesoxirribonucleases/metabolismo , Sistemas Inteligentes , Genômica/métodos , Genômica/tendências , Humanos , Hidrólise , Conformação de Ácido Nucleico , Nucleossomos/química , Nucleossomos/metabolismo , Conformação Proteica , Pegadas de Proteínas/tendências
6.
BMC Genomics ; 17(1): 873, 2016 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-27814676

RESUMO

BACKGROUND: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. RESULTS: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. CONCLUSIONS: The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/ .


Assuntos
Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Sítios de Ligação , Proteínas de Ligação a DNA/metabolismo , Motivos de Nucleotídeos , Ligação Proteica , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
7.
Patterns (N Y) ; 5(3): 100927, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487805

RESUMO

In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

8.
Methods Mol Biol ; 2846: 91-107, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39141231

RESUMO

ChIP-exo is a powerful tool for achieving enhanced sensitivity and single-base-pair resolution of transcription factor (TF) binding, which utilizes a combination of chromatin immunoprecipitation (ChIP) and lambda exonuclease digestion (exo) followed by high-throughput sequencing. ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode, and single ligation) is an updated and simplified version of the original ChIP-exo method, which has reported an efficient adapter ligation through the DNA circularization step. Building upon an established method, we present a protocol for generating NGS (next-generation sequencing) ready and high-quality ChIP-nexus library for glucocorticoid receptor (GR). This method is specifically optimized for bone marrow-derived macrophage (BMDM) cells. The protocol is initiated by the formation of DNA-protein cross-links in intact cells. This is followed by chromatin shearing, chromatin immunoprecipitation, ligation of sequencing adapters, digestion of adapter-ligated DNA using lambda exonuclease, and purification of single-stranded DNA for circularization and library amplification.


Assuntos
Imunoprecipitação da Cromatina , DNA , Sequenciamento de Nucleotídeos em Larga Escala , Macrófagos , Receptores de Glucocorticoides , Animais , Receptores de Glucocorticoides/metabolismo , Receptores de Glucocorticoides/genética , Camundongos , Macrófagos/metabolismo , DNA/metabolismo , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunoprecipitação da Cromatina/métodos , Ligação Proteica , Sítios de Ligação
9.
Genome Biol ; 25(1): 284, 2024 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-39482734

RESUMO

BACKGROUND: Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. RESULTS: We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. CONCLUSION: Our work provides new strategies for predicting the functional impact of non-coding variants.


Assuntos
Alelos , Benchmarking , Sequenciamento de Cromatina por Imunoprecipitação , DNA , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Humanos , DNA/metabolismo , DNA/genética , Ligação Proteica , Sítios de Ligação
10.
Methods Mol Biol ; 2599: 33-48, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36427141

RESUMO

Chromatin immunoprecipitation (ChIP) is a technique to determine whether a protein interacts with a specific DNA sequence. ChIP-sequencing (ChIP-seq) is one of the most widely used methods to identify genome-wide DNA-binding sites of nuclear proteins. Here, we describe the ChIP-exo method, which is a refined version of ChIP-seq combined with lambda exonuclease digestion. ChIP-exo can identify genomic locations of DNA-binding proteins at a near single base-pair (bp) resolution. It removes most of the background DNA signals. ChIP-exo has emerged as a powerful technique to study the genome-wide organization of DNA-binding proteins.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Proteínas de Ligação a DNA , Proteínas de Ligação a DNA/genética , Imunoprecipitação da Cromatina , Genômica , Proteínas Nucleares
11.
Front Microbiol ; 14: 1271121, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38239730

RESUMO

Salmonella enterica serovar Typhimurium (S. Typhimurium) is a common foodborne pathogen which is frequently used as the reference strain for Salmonella. Investigating the sigma factor network and protomers is crucial to understand the genomic and transcriptomic properties of the bacterium. Its promoters were identified using various methods such as dRNA-seq, ChIP-chip, or ChIP-Seq. However, validation using ChIP-exo, which exhibits higher-resolution performance compared to conventional ChIP, has not been conducted to date. In this study, using the representative strain S. Typhimurium LT2 (LT2), the ChIP-exo experiment was conducted to accurately determine the binding sites of catalytic RNA polymerase subunit RpoB and major sigma factors (RpoD, RpoN, RpoS, and RpoE) during exponential phase. Integrated with the results of RNA-Seq, promoters and sigmulons for the sigma factors and their association with RpoB have been discovered. Notably, the overlapping regions among binding sites of each alternative sigma factor were found. Furthermore, comparative analysis with Escherichia coli str. K-12 substr. MG1655 (MG1655) revealed conserved binding sites of RpoD and RpoN across different species. In the case of small RNAs (sRNAs), 50 sRNAs observed their expression during the exponential growth of LT2. Collectively, the integration of ChIP-exo and RNA-Seq enables genome-scale promoter mapping with high resolution and facilitates the characterization of binding events of alternative sigma factors, enabling a comprehensive understanding of the bacterial sigma factor network and condition-specific active promoters.

12.
Comput Struct Biotechnol J ; 21: 99-104, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36544470

RESUMO

Genome-scale studies of the bacterial regulatory network have been leveraged by declining sequencing cost and advances in ChIP (chromatin immunoprecipitation) methods. Of which, ChIP-exo has proven competent with its near-single base-pair resolution. While several algorithms and programs have been developed for different analytical steps in ChIP-exo data processing, there is a lack of effort in incorporating them into a convenient bioinformatics pipeline that is intuitive and publicly available. In this paper, we developed ChIP-exo Analysis Pipeline (ChEAP) that executes the one-step process, starting from trimming and aligning raw sequencing reads to visualization of ChIP-exo results. The pipeline was implemented on the interactive web-based Python development environment - Jupyter Notebook, which is compatible with the Google Colab cloud platform to facilitate the sharing of codes and collaboration among researchers. Additionally, users could exploit the free GPU and CPU resources allocated by Colab to carry out computing tasks regardless of the performance of their local machines. The utility of ChEAP was demonstrated with the ChIP-exo datasets of RpoN sigma factor in E. coli K-12 MG1655. To analyze two raw data files, ChEAP runtime was 2 min and 25 s. Subsequent analyses identified 113 RpoN binding sites showing a conserved RpoN binding pattern in the motif search. ChEAP application in ChIP-exo data analysis is extensive and flexible for the parallel processing of data from various organisms.

13.
Biochim Biophys Acta Gene Regul Mech ; 1865(3): 194811, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35318951

RESUMO

Transcription factor binding to DNA is a central mechanism regulating gene expression. Thus, thorough characterization of this process is essential for understanding cellular biology in both health and disease. We combined data from three sequencing-based methods to unravel the DNA binding function of the novel ZNF414 protein in cells representing two tumor types. ChIP-exo served to map protein binding sites, ATAC-seq allowed identification of open chromatin, and RNA-seq examined the transcriptome. We show that ZNF414 is a DNA-binding protein that both induces and represses gene expression. This transcriptional response has an impact on cellular processes related to proliferation and other malignancy-associated functions, such as cell migration and DNA repair. Approximately 20% of the differentially expressed genes harbored ZNF414 binding sites in their promoters in accessible chromatin, likely representing direct targets of ZNF414. De novo motif discovery revealed several putative ZNF414 binding sequences, one of which was validated using EMSA. In conclusion, this study illustrates a highly efficient integrative approach for the characterization of the DNA binding and transcriptional activity of transcription factors.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina , Cromatina/genética , Imunoprecipitação da Cromatina , DNA , RNA-Seq
14.
Methods Mol Biol ; 2522: 209-222, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36125752

RESUMO

Genome-wide occupancy studies for RNA polymerases and their basal transcription factors deliver information about transcription dynamics and the recruitment of transcription elongation and termination factors in eukaryotes and prokaryotes. The primary method to determine genome-wide occupancies is chromatin immunoprecipitation combined with deep sequencing (ChIP-seq). Archaea possess a transcription machinery that is evolutionarily closer related to its eukaryotic counterpart but it operates in a prokaryotic cellular context. Studies on archaeal transcription brought insight into the evolution of transcription machineries and the universality of transcription mechanisms. Because of the limited resolution of ChIP-seq, the close spacing of promoters and transcription units found in archaeal genomes pose a challenge for ChIP-seq and the ensuing data analysis. The extreme growth temperature of many established archaeal model organisms necessitates further adaptations. This chapter describes a version of ChIP-seq adapted for the basal transcription machinery of thermophilic archaea and some modifications to the data analysis.


Assuntos
Archaea , Sequenciamento de Cromatina por Imunoprecipitação , Archaea/genética , RNA Polimerases Dirigidas por DNA/genética , Genoma Arqueal , Fatores de Transcrição/genética
15.
Microb Genom ; 8(5)2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35584008

RESUMO

Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.


Assuntos
Escherichia coli K12 , Escherichia coli , Escherichia coli/genética , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Regulação Bacteriana da Expressão Gênica , Óperon/genética , Reprodutibilidade dos Testes
16.
Gigascience ; 10(1)2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33410471

RESUMO

BACKGROUND: FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS: Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS: PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.


Assuntos
Análise de Dados , Software , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes
17.
J Comput Biol ; 27(3): 429-435, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32023130

RESUMO

Regulatory proteins can employ multiple direct and indirect modes of interaction with the genome. The ChIP-exo mixture model (ChExMix) provides a principled approach to detecting multiple protein-DNA interaction modes in a single ChIP-exo experiment. ChExMix discovers and characterizes binding event subtypes in ChIP-exo data by leveraging both protein-DNA cross-linking signatures and DNA motifs. In this study, we present a summary of the major features and applications of ChExMix. We demonstrate that ChExMix does not require high-resolution protein-DNA binding assay data to detect binding event subtypes. Specifically, we apply ChExMix to analyze 393 ChIP-seq data profiles in K562 cells. Similar binding event subtypes are discovered across multiple proteins, suggesting the existence of colocalized regulatory protein modules that are recruited to DNA through a particular sequence-specific transcription factor. Our results thus suggest that ChExMix can characterize protein-DNA binding interaction modes using data from multiple types of protein-DNA interaction assays.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Algoritmos , Imunoprecipitação da Cromatina , DNA/química , Proteínas de Ligação a DNA/química , Bases de Dados Genéticas , Humanos , Células K562 , Motivos de Nucleotídeos , Ligação Proteica
18.
mSystems ; 5(6)2020 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-33172971

RESUMO

Escherichia coli uses two-component systems (TCSs) to respond to environmental signals. TCSs affect gene expression and are parts of E. coli's global transcriptional regulatory network (TRN). Here, we identified the regulons of five TCSs in E. coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc. We analyzed RNA-seq data using independent component analysis (ICA). ChIP-exo data were used to validate condition-specific target gene binding sites. Based on these data, we do the following: (i) identify the target genes for each TCS; (ii) show how the target genes are transcribed in response to stimulus; and (iii) reveal novel relationships between TCSs, which indicate noncognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress. Our understanding of the TRN in E. coli is thus notably expanded.IMPORTANCE E. coli is a common commensal microbe found in the human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections, and meningitis. E. coli's two-component systems (TCSs) modulate target gene expression, especially related to virulence, pathogenesis, and antimicrobial peptides, in response to environmental stimuli. Thus, it is of utmost importance to understand the transcriptional regulation of TCSs to infer bacterial environmental adaptation and disease pathogenicity. Utilizing a combinatorial approach integrating RNA sequencing (RNA-seq), independent component analysis, chromatin immunoprecipitation coupled with exonuclease treatment (ChIP-exo), and data mining, we suggest five different modes of TCS transcriptional regulation. Our data further highlight noncognate inducers of TCSs, which emphasizes the cross-regulatory nature of TCSs in E. coli and suggests that TCSs may have a role beyond their cognate functionalities. In summary, these results can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.

19.
Biol Methods Protoc ; 4(1): bpz011, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32395628

RESUMO

The decrease of sequencing cost in the recent years has made genome-wide studies of transcription factor (TF) binding through chromatin immunoprecipitation methods like ChIP-seq and chromatin immunoprecipitation with lambda exonuclease (ChIP-exo) more accessible to a broader group of users. Especially with ChIP-exo, it is now possible to map TF binding sites in more detail and with less noise than previously possible. These improvements came at the cost of making the analysis of the data more challenging, which is further complicated by the fact that to this date no complete pipeline is publicly available. Here we present a workflow developed specifically for ChIP-exo data and demonstrate its capabilities for data analysis. The pipeline, which is completely publicly available on GitHub, includes all necessary analytical steps to obtain a high confidence list of TF targets starting from raw sequencing reads. During the pipeline development, we emphasized the inclusion of different quality control measurements and we show how to use these so users can have confidence in their obtained results.

20.
Curr Med Chem ; 26(42): 7641-7654, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29848263

RESUMO

BACKGROUND: Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. OBJECTIVE: This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. CONCLUSION: ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome- wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux.


Assuntos
Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Genoma , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Sequenciamento de Cromatina por Imunoprecipitação/estatística & dados numéricos , Biologia Computacional , Humanos , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa