Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters








Publication year range
1.
bioRxiv ; 2024 May 24.
Article in English | MEDLINE | ID: mdl-38826350

ABSTRACT

The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.

2.
Front Genet ; 15: 1353553, 2024.
Article in English | MEDLINE | ID: mdl-38505828

ABSTRACT

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

3.
bioRxiv ; 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-37163020

ABSTRACT

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. As new methodologies emerge, a natural step is to compare their results with those from established methodologies, such as the classic methods of molecular biology used to characterize transcription factor binding sites, promoters, or transcription units. In the case of Escherichia coli K-12, the best-studied microorganism, for the last 30 years we have continuously gathered such knowledge from original scientific publications, and have organized it in two databases, RegulonDB and EcoCyc. Furthermore, since RegulonDB version 11.0 (1), we offer comprehensive datasets of binding sites from chromatin immunoprecipitation combined with sequencing (ChIP-seq), ChIP combined with exonuclease digestion and next-generation sequencing (ChIP-exo), genomic SELEX screening (gSELEX), and DNA affinity purification sequencing (DAP-seq) HT technologies, as well as additional datasets for transcription start sites, transcription units and RNA sequencing (RNA-seq) expression profiles. Here, we present for the first time an analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of E. coli K-12. An RI is formed by the transcription factor, its positive or negative effect on a promoter, a gene or transcription unit. We improved the evidence codes so that the specific methods are described, and we classified them into seven independent groups. This is the basis for our updated computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. We compare the confidence levels of the RI collection before and after adding HT evidence illustrating how knowledge will change as more HT data and methods appear in the future. Users can generate subsets filtering out the method they want to benchmark and avoid circularity, or keep for instance only the confirmed interactions. The comparison of different HT methods with the available datasets indicate that ChIP-seq recovers the highest fraction (>70%) of binding sites present in RegulonDB followed by gSELEX, DAP-seq and ChIP-exo. There is no other genomic database that offers this comprehensive high-quality anatomy of evidence supporting a corpus of transcriptional regulatory interactions.

4.
Microb Genom ; 8(5)2022 05.
Article in English | MEDLINE | ID: mdl-35584008

ABSTRACT

Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.


Subject(s)
Escherichia coli K12 , Escherichia coli , Escherichia coli/genetics , Escherichia coli K12/genetics , Escherichia coli K12/metabolism , Gene Expression Regulation, Bacterial , Operon/genetics , Reproducibility of Results
5.
EMBO Mol Med ; 11(10): e9930, 2019 10.
Article in English | MEDLINE | ID: mdl-31476112

ABSTRACT

Therapeutic resistance is a major clinical challenge in oncology. Evidence identifies cancer stem cells (CSCs) as a driver of tumor evolution. Accordingly, the key stemness property unique to CSCs may represent a reservoir of therapeutic target to improve cancer treatment. Here, we carried out a genome-wide RNA interference screen to identify genes that regulate breast CSCs-fate (bCSC). Using an interactome/regulome analysis, we integrated screen results in a functional mapping of the CSC-related processes. This network analysis uncovered potential therapeutic targets controlling bCSC-fate. We tested a panel of 15 compounds targeting these regulators. We showed that mifepristone, salinomycin, and JQ1 represent the best anti-bCSC activity. A combination assay revealed a synergistic interaction of salinomycin/JQ1 association to deplete the bCSC population. Treatment of primary breast cancer xenografts with this combination reduced the tumor-initiating cell population and limited metastatic development. The clinical relevance of our findings was reinforced by an association between the expression of the bCSC-related networks and patient prognosis. Targeting bCSCs with salinomycin/JQ1 combination provides the basis for a new therapeutic approach in the treatment of breast cancer.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/physiopathology , Drug Discovery/methods , Genetic Testing/methods , Genome-Wide Association Study/methods , Neoplastic Stem Cells/physiology , RNA Interference , Antineoplastic Agents/pharmacology , Female , Gene Regulatory Networks , Humans , Protein Interaction Maps , Tumor Cells, Cultured
6.
Curr Protoc Bioinformatics ; 66(1): e72, 2019 06.
Article in English | MEDLINE | ID: mdl-30786165

ABSTRACT

Next-generation sequencing (NGS) is becoming a routine approach in most domains of the life sciences. To ensure reproducibility of results, there is a crucial need to improve the automation of NGS data processing and enable forthcoming studies relying on big datasets. Although user-friendly interfaces now exist, there remains a strong need for accessible solutions that allow experimental biologists to analyze and explore their results in an autonomous and flexible way. The protocols here describe a modular system that enable a user to compose and fine-tune workflows based on SnakeChunks, a library of rules for the Snakemake workflow engine. They are illustrated using a study combining ChIP-seq and RNA-seq to identify target genes of the global transcription factor FNR in Escherichia coli, which has the advantage that results can be compared with the most up-to-date collection of existing knowledge about transcriptional regulation in this model organism, extracted from the RegulonDB database. © 2019 by John Wiley & Sons, Inc.


Subject(s)
Bacteria/genetics , Chromatin Immunoprecipitation Sequencing/methods , RNA-Seq , Software , Base Sequence , Genome, Bacterial , Nucleotide Motifs/genetics , User-Computer Interface
7.
Bioinformatics ; 34(11): 1934-1936, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29361152

ABSTRACT

Summary: We designed a PyQt graphical user interface-Sequanix-aimed at democratizing the use of Snakemake pipelines in the NGS space and beyond. By default, Sequanix includes Sequana NGS pipelines (Snakemake format) (http://sequana.readthedocs.io), and is also capable of loading any external Snakemake pipeline. New users can easily, visually, edit configuration files of expert-validated pipelines and can interactively execute these production-ready workflows. Sequanix will be useful to both Snakemake developers in exposing their pipelines and to a wide audience of users. Availability and implementation: Source on http://github.com/sequana/sequana, bio-containers on http://bioconda.github.io and Singularity hub (http://singularity-hub.org). Contact: dimitri.desvillechabrol@pasteur.fr or thomas.cokelaer@pasteur.fr. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Data Visualization , Software
8.
PLoS One ; 12(9): e0185400, 2017.
Article in English | MEDLINE | ID: mdl-28949986

ABSTRACT

High-throughput RNAi screenings (HTS) allow quantifying the impact of the deletion of each gene in any particular function, from virus-host interactions to cell differentiation. However, there has been less development for functional analysis tools dedicated to RNAi analyses. HTS-Net, a network-based analysis program, was developed to identify gene regulatory modules impacted in high-throughput screenings, by integrating transcription factors-target genes interaction data (regulome) and protein-protein interaction networks (interactome) on top of screening z-scores. HTS-Net produces exhaustive HTML reports for results navigation and exploration. HTS-Net is a new pipeline for RNA interference screening analyses that proves better performance than simple gene rankings by z-scores, by re-prioritizing genes and replacing them in their biological context, as shown by the three studies that we reanalyzed. Formatted input data for the three studied datasets, source code and web site for testing the system are available from the companion web site at http://htsnet.marseille.inserm.fr/. We also compared our program with existing algorithms (CARD and hotnet2).


Subject(s)
Gene Regulatory Networks , High-Throughput Nucleotide Sequencing/methods , Models, Genetic , Algorithms , Cell Differentiation , Databases, Genetic , Embryonic Stem Cells/cytology , Hepacivirus/physiology , Humans , Programming Languages , RNA Interference , Virus Replication
9.
Front Immunol ; 8: 876, 2017.
Article in English | MEDLINE | ID: mdl-28804485

ABSTRACT

Our previous transcriptomic analysis of Glossina palpalis gambiensis experimentally infected or not with Trypanosoma brucei gambiense aimed to detect differentially expressed genes (DEGs) associated with infection. Specifically, we selected candidate genes governing tsetse fly vector competence that could be used in the context of an anti-vector strategy, to control human and/or animal trypanosomiasis. The present study aimed to verify whether gene expression in field tsetse flies (G. p. palpalis) is modified in response to natural infection by trypanosomes (T. congolense), as reported when insectary-raised flies (G. p. gambiensis) are experimentally infected with T. b. gambiense. This was achieved using the RNA-seq approach, which identified 524 DEGs in infected vs. non-infected tsetse flies, including 285 downregulated genes and 239 upregulated genes (identified using DESeq2). Several of these genes were highly differentially expressed, with log2 fold change values in the vicinity of either +40 or -40. Downregulated genes were primarily involved in transcription/translation processes, whereas encoded upregulated genes governed amino acid and nucleotide biosynthesis pathways. The BioCyc metabolic pathways associated with infection also revealed that downregulated genes were mainly involved in fly immunity processes. Importantly, our study demonstrates that data on the molecular cross-talk between the host and the parasite (as well as the always present fly microbiome) recorded from an experimental biological model has a counterpart in field flies, which in turn validates the use of experimental host/parasite couples.

10.
Cell Rep ; 18(9): 2256-2268, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28249169

ABSTRACT

Breast cancer stem cells (bCSCs) have been implicated in tumor progression and therapeutic resistance; however, the molecular mechanisms that define this state are unclear. We have performed two microRNA (miRNA) gain- and loss-of-function screens to identify miRNAs that regulate the choice between bCSC self-renewal and differentiation. We find that micro-RNA (miR)-600 silencing results in bCSC expansion, while its overexpression reduces bCSC self-renewal, leading to decreased in vivo tumorigenicity. miR-600 targets stearoyl desaturase 1 (SCD1), an enzyme required to produce active, lipid-modified WNT proteins. In the absence of miR-600, WNT signaling is active and promotes self-renewal, whereas overexpression of miR-600 inhibits the production of active WNT and promotes bCSC differentiation. In a series of 120 breast tumors, we found that a low level of miR-600 is correlated with active WNT signaling and a poor prognosis. These findings highlight a miR-600-centered signaling network that governs bCSC-fate decisions and influences tumor progression.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , MicroRNAs/genetics , Neoplastic Stem Cells/pathology , Signal Transduction/physiology , Wnt Proteins/genetics , Wnt Signaling Pathway/physiology , Carcinogenesis/metabolism , Carcinogenesis/pathology , Cell Differentiation/genetics , Cell Line, Tumor , Female , Gene Expression Regulation, Neoplastic/genetics , Humans , Stearoyl-CoA Desaturase/genetics
11.
Methods Mol Biol ; 1482: 279-95, 2016.
Article in English | MEDLINE | ID: mdl-27557774

ABSTRACT

The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors.


Subject(s)
Computational Biology/methods , Genomics/methods , Regulatory Elements, Transcriptional/genetics , Software , Gene Expression Regulation, Plant , Genome, Plant/genetics , Nucleotide Motifs/genetics , Transcription Factors/genetics
12.
Methods Mol Biol ; 1482: 297-322, 2016.
Article in English | MEDLINE | ID: mdl-27557775

ABSTRACT

In this protocol, we explain how to run ab initio motif discovery in order to gather putative transcription factor binding motifs (TFBMs) from sets of genomic regions returned by ChIP-seq experiments. The protocol starts from a set of peak coordinates (genomic regions) which can be either downloaded from ChIP-seq databases, or produced by a peak-calling software tool. We provide a concise description of the successive steps to discover motifs, cluster the motifs returned by different motif discovery algorithms, and compare them with reference motif databases. The protocol is documented with detailed notes explaining the rationale underlying the choice of options. The interpretation of the results is illustrated with an example from the model plant Arabidopsis thaliana.


Subject(s)
Chromatin Immunoprecipitation/methods , Computational Biology/methods , Genomics/methods , Software , Algorithms , Arabidopsis/genetics , Binding Sites/genetics , Genome, Plant/genetics , High-Throughput Nucleotide Sequencing , Nucleotide Motifs/genetics , Regulatory Elements, Transcriptional
SELECTION OF CITATIONS
SEARCH DETAIL