RESUMO
Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.
RESUMO
Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.
Assuntos
Escherichia coli K12 , Escherichia coli , Escherichia coli/genética , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Regulação Bacteriana da Expressão Gênica , Óperon/genética , Reprodutibilidade dos TestesRESUMO
The number of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases contribute facilitating the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present Lisen&Curate, a curation system that facilitates linking sentences or part of sentences (both considered sources) in articles with their corresponding curated objects, so that rich additional information of these objects is easily available to users. These sources are going to be offered both within RegulonDB and a new database, L-Regulon. To show the relevance of our work, two senior curators performed a curation of 31 articles on the regulation of transcription initiation of E. coli using Lisen&Curate. As a result, 194 objects were curated and 781 sources were recorded. We also found that these sources are useful to develop automatic approaches to detect objects in articles by observing word frequency patterns and by carrying out an open information extraction task. Sources may help to elaborate a controlled vocabulary of experimental methods. Finally, we discuss our ecosystem of interconnected applications, RegulonDB, L-Regulon, and Lisen&Curate, to facilitate the access to knowledge on regulation of transcription initiation in bacteria. We see our proposal as the starting point to change the way experimentalists connect a piece of knowledge with its evidence using RegulonDB.
Assuntos
Curadoria de Dados/métodos , Bases de Dados Genéticas , Regulação Bacteriana da Expressão Gênica , Iniciação da Transcrição Genética , Escherichia coli/genéticaRESUMO
The published online version of this paper contains mistake. The authors first and last names have been interchanged. The correct version is given above.
RESUMO
Although the specific function of SCO2127 remains elusive, it has been assumed that this hypothetical protein plays an important role in carbon catabolite regulation and therefore in antibiotic biosynthesis in Streptomyces coelicolor. To shed light on the functional relationship of SCO2127 to the biosynthesis of actinorhodin, a detailed analysis of the proteins differentially produced between the strain M145 and the Δsco2127 mutant of S. coelicolor was performed. The delayed morphological differentiation and impaired production of actinorhodin showed by the deletion strain were accompanied by increased abundance of gluconeogenic enzymes, as well as downregulation of both glycolysis and acetyl-CoA carboxylase. Repression of mycothiol biosynthetic enzymes was further observed in the absence of SCO2127, in addition to upregulation of hydroxyectoine biosynthetic enzymes and SCO0204, which controls nitrite formation. The data generated in this study reveal that the response regulator SCO0204 greatly contributes to prevent the formation of actinorhodin in the ∆sco2127 mutant, likely through the activation of some proteins associated with oxidative stress that include the nitrite producer SCO0216.