Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters











Publication year range
1.
EcoSal Plus ; 6(1)2014 May.
Article in English | MEDLINE | ID: mdl-26442933

ABSTRACT

EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.

2.
Database (Oxford) ; 2013: bas059, 2013.
Article in English | MEDLINE | ID: mdl-23327937

ABSTRACT

RegulonDB provides curated information on the transcriptional regulatory network of Escherichia coli and contains both experimental data and computationally predicted objects. To account for the heterogeneity of these data, we introduced in version 6.0, a two-tier rating system for the strength of evidence, classifying evidence as either 'weak' or 'strong' (Gama-Castro,S., Jimenez-Jacinto,V., Peralta-Gil,M. et al. RegulonDB (Version 6.0): gene regulation model of Escherichia Coli K-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Res., 2008;36:D120-D124.). We now add to our classification scheme the classification of high-throughput evidence, including chromatin immunoprecipitation (ChIP) and RNA-seq technologies. To integrate these data into RegulonDB, we present two strategies for the evaluation of confidence, statistical validation and independent cross-validation. Statistical validation involves verification of ChIP data for transcription factor-binding sites, using tools for motif discovery and quality assessment of the discovered matrices. Independent cross-validation combines independent evidence with the intention to mutually exclude false positives. Both statistical validation and cross-validation allow to upgrade subsets of data that are supported by weak evidence to a higher confidence level. Likewise, cross-validation of strong confidence data extends our two-tier rating system to a three-tier system by introducing a third confidence score 'confirmed'. Database URL: http://regulondb.ccg.unam.mx/


Subject(s)
Computational Biology/methods , Databases, Genetic , Escherichia coli/genetics , Regulon/genetics , Statistics as Topic , Biosynthetic Pathways/genetics , Chromatin Immunoprecipitation , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Position-Specific Scoring Matrices , Reproducibility of Results , Transcription Initiation Site
3.
Nucleic Acids Res ; 41(Database issue): D203-13, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203884

ABSTRACT

This article summarizes our progress with RegulonDB (http://regulondb.ccg.unam.mx/) during the past 2 years. We have kept up-to-date the knowledge from the published literature regarding transcriptional regulation in Escherichia coli K-12. We have maintained and expanded our curation efforts to improve the breadth and quality of the encoded experimental knowledge, and we have implemented criteria for the quality of our computational predictions. Regulatory phrases now provide high-level descriptions of regulatory regions. We expanded the assignment of quality to various sources of evidence, particularly for knowledge generated through high-throughput (HT) technology. Based on our analysis of most relevant methods, we defined rules for determining the quality of evidence when multiple independent sources support an entry. With this latest release of RegulonDB, we present a new highly reliable larger collection of transcription start sites, a result of our experimental HT genome-wide efforts. These improvements, together with several novel enhancements (the tracks display, uploading format and curational guidelines), address the challenges of incorporating HT-generated knowledge into RegulonDB. Information on the evolutionary conservation of regulatory elements is also available now. Altogether, RegulonDB version 8.0 is a much better home for integrating knowledge on gene regulation from the sources of information currently available.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Regulatory Elements, Transcriptional , Transcription, Genetic , Bacterial Proteins/metabolism , Databases, Genetic/standards , Evolution, Molecular , Genomics , Internet , Promoter Regions, Genetic , Regulon , Repressor Proteins/metabolism , Sequence Analysis, RNA , Transcription Factors/metabolism , Transcription Initiation Site
4.
Nucleic Acids Res ; 41(Database issue): D605-12, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23143106

ABSTRACT

EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Binding Sites , Escherichia coli K12/metabolism , Escherichia coli Proteins/classification , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Internet , Membrane Transport Proteins/classification , Membrane Transport Proteins/metabolism , Models, Genetic , Molecular Sequence Annotation , Phenotype , Position-Specific Scoring Matrices , Promoter Regions, Genetic , Systems Biology , Transcription Factors/metabolism , Transcription, Genetic
5.
Nucleic Acids Res ; 39(Database issue): D98-105, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051347

ABSTRACT

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database of the best-known regulatory network of any free-living organism, that of Escherichia coli K-12. The major conceptual change since 3 years ago is an expanded biological context so that transcriptional regulation is now part of a unit that initiates with the signal and continues with the signal transduction to the core of regulation, modifying expression of the affected target genes responsible for the response. We call these genetic sensory response units, or Gensor Units. We have initiated their high-level curation, with graphic maps and superreactions with links to other databases. Additional connectivity uses expandable submaps. RegulonDB has summaries for every transcription factor (TF) and TF-binding sites with internal symmetry. Several DNA-binding motifs and their sizes have been redefined and relocated. In addition to data from the literature, we have incorporated our own information on transcription start sites (TSSs) and transcriptional units (TUs), obtained by using high-throughput whole-genome sequencing technologies. A new portable drawing tool for genomic features is also now available, as well as new ways to download the data, including web services, files for several relational database manager systems and text files including BioPAX format.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Transcription Factors/metabolism , Binding Sites , Escherichia coli K12/metabolism , Signal Transduction , Systems Integration , Transcription Initiation Site , Transcription, Genetic
6.
PLoS One ; 4(10): e7526, 2009 Oct 19.
Article in English | MEDLINE | ID: mdl-19838305

ABSTRACT

Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5' RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of sigma factors that control the expression of about 80% of these genes. As expected, the housekeeping sigma(70) was the most common type of promoter, followed by sigma(38). The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli.


Subject(s)
Escherichia coli/genetics , Genes, Bacterial/genetics , Genome, Bacterial , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic , Base Sequence , Binding Sites , Chromosome Mapping , Computational Biology/methods , Gene Regulatory Networks , Models, Genetic , Molecular Sequence Data , Promoter Regions, Genetic , Sequence Homology, Nucleic Acid
7.
Nucleic Acids Res ; 36(Database issue): D120-4, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18158297

ABSTRACT

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database offering curated knowledge of the transcriptional regulatory network of Escherichia coli K12, currently the best-known electronically encoded database of the genetic regulatory network of any free-living organism. This paper summarizes the improvements, new biology and new features available in version 6.0. Curation of original literature is, from now on, up to date for every new release. All the objects are supported by their corresponding evidences, now classified as strong or weak. Transcription factors are classified by origin of their effectors and by gene ontology class. We have now computational predictions for sigma(54) and five different promoter types of the sigma(70) family, as well as their corresponding -10 and -35 boxes. In addition to those curated from the literature, we added about 300 experimentally mapped promoters coming from our own high-throughput mapping efforts. RegulonDB v.6.0 now expands beyond transcription initiation, including RNA regulatory elements, specifically riboswitches, attenuators and small RNAs, with their known associated targets. The data can be accessed through overviews of correlations about gene regulation. RegulonDB associated original literature, together with more than 4000 curation notes, can now be searched with the Textpresso text mining engine.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Computational Biology , Internet , Models, Genetic , Promoter Regions, Genetic , Regulatory Sequences, Ribonucleic Acid , Regulon , Sigma Factor/metabolism , Software , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic
8.
PLoS Genet ; 2(11): e185, 2006 Nov 10.
Article in English | MEDLINE | ID: mdl-17096598

ABSTRACT

The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that could be recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to see the generality of this pattern, we have analyzed 43 additional genomes belonging to most established bacterial phyla. Differential densities between regulatory and nonregulatory regions are detectable in most of the analyzed genomes, with the exception of those that have evolved toward extreme genome reduction. Thus, presence of this pattern follows that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is an outcome of the process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely related bacterial genomes.


Subject(s)
DNA-Directed RNA Polymerases/metabolism , Genome, Bacterial/genetics , Promoter Regions, Genetic/genetics , Regulatory Sequences, Nucleic Acid/genetics , Selection, Genetic , Sigma Factor/metabolism , Amino Acid Motifs , Base Sequence , Consensus Sequence , DNA, Bacterial/genetics , Escherichia coli/genetics , Molecular Sequence Data , Mycobacterium leprae/genetics , Mycobacterium tuberculosis/genetics , Sequence Alignment
9.
Nucleic Acids Res ; 34(14): 3980-7, 2006.
Article in English | MEDLINE | ID: mdl-16914446

ABSTRACT

Here we show that regions upstream of first transcribed genes have oligonucleotide signatures that distinguish them from regions upstream of genes in the middle of operons. Databases of experimentally confirmed transcription units do not exist for most genomes. Thus, to expand the analyses into genomes with no experimentally confirmed data, we used genes conserved adjacent in evolutionarily distant genomes as representatives of genes inside operons. Likewise, we used divergently transcribed genes as representative examples of first transcribed genes. In model organisms, the trinucleotide signatures of regions upstream of these representative genes allow for operon predictions with accuracies close to those obtained with known operon data (0.8). Signature-based operon predictions have more similar phylogenetic profiles and higher proportions of genes in the same pathways than predicted transcription unit boundaries (TUBs). These results confirm that we are separating genes with related functions, as expected for operons, from genes not necessarily related, as expected for genes in different transcription units. We also test the quality of the predictions using microarray data in six genomes and show that the signature-predicted operons tend to have high correlations of expression. Oligonucleotide signatures should expand the number of tools available to identify operons even in poorly characterized genomes.


Subject(s)
Genome, Bacterial , Genomics/methods , Operon , Promoter Regions, Genetic , Bacillus subtilis/genetics , Bacteria/genetics , Computational Biology/methods , DNA-Directed RNA Polymerases/metabolism , Escherichia coli/genetics , Gene Expression , Genes, Bacterial , Genome, Archaeal , Phylogeny , Sigma Factor/metabolism
10.
Mol Biol Evol ; 23(5): 997-1010, 2006 May.
Article in English | MEDLINE | ID: mdl-16547149

ABSTRACT

The selective mechanisms operating in regulatory regions of bacterial genomes are poorly understood. We have previously shown that, in most bacterial genomes, regulatory regions contain high densities of sigma70 promoter-like signals that are significantly above the densities detected in nonregulatory genomic regions. In order to investigate the molecular evolutionary forces that operate in bacterial regulatory regions and how they affect the observed redundancy of promoter-like signals, we have undertaken a comparative analysis across the completely sequenced genomes of enteric gamma-proteobacteria. This analysis detects significant positional conservation of promoter-like signal clusters across enterics, some times in spite of strong primary sequence divergence. This suggests that the conservation of the nature and exact position of specific nucleotides is not necessarily the priority of selection for maintaining the transcriptional function in these bacteria. We have further characterized the structural conservation of the regulatory regions of dnaQ and crp across all enterics. These two regions differ in essentiality and mode of regulation, the regulation of crp being more complex and involving interactions with several transcription factors. This results in substantially different modes of evolution, with the dnaQ region appearing to evolve under stronger purifying selection and the crp region showing the likely effects of stabilizing selection for a complex pattern of gene expression. The higher flexibility of the crp region is consistent with the observed less conservation of global regulators in evolution. Patterns of regulatory evolution are also found to be markedly different in endosymbiotic bacteria, in a manner consistent with regulatory regions suffering some level of degradation, as has been observed for many other characters in these genomes. Therefore, the mode of evolution of bacterial regulatory regions appears to be highly dependent on both the lifestyle of the bacterium and the specific regulatory requirements of different genes. In fact, in many bacteria, the mode of evolution of genes requiring significant physiological adaptability in expression levels may follow patterns similar to those operating in the more complex regulatory regions of eukaryotic genomes.


Subject(s)
Enterobacteriaceae/genetics , Genome , Promoter Regions, Genetic , Amino Acid Sequence , Biological Evolution , Cluster Analysis , Evolution, Molecular , Genes, Bacterial , Genome, Bacterial , Models, Genetic , Models, Statistical , Molecular Sequence Data , Sequence Homology, Amino Acid
11.
J Mol Biol ; 354(1): 184-99, 2005 Nov 18.
Article in English | MEDLINE | ID: mdl-16236313

ABSTRACT

Experimental data on the Escherichia coli transcriptional regulation has enabled the construction of statistical models to predict new regulatory elements within its genome. Far less is known about the transcriptional regulatory elements in other gamma-proteobacteria with sequenced genomes, so it is of great interest to conduct comparative genomic studies oriented to extracting biologically relevant information about transcriptional regulation in these less studied organisms using the knowledge from E. coli. In this work, we use the information stored in the TRACTOR_DB database to conduct a comparative study on the mechanisms of transcriptional regulation in eight gamma-proteobacteria and 38 regulons. We assess the conservation of transcription factors binding specificity across all the eight genomes and show a correlation between the conservation of a regulatory site and the structure of the transcription unit it regulates. We also find a marked conservation of site-promoter distances across the eight organisms and a correspondence of the statistical significance of co-occurrence of pairs of transcription factor binding sites in the regulatory regions, which is probably related to a conserved architecture of higher-order regulatory complexes in the organisms studied. The results obtained in this study using the information on transcriptional regulation in E. coli enable us to conclude that not only transcription factor-binding sites are conserved across related species but also several of the transcriptional regulatory mechanisms previously identified in E. coli.


Subject(s)
Computational Biology , Gammaproteobacteria/genetics , Gene Expression Regulation, Bacterial , Genome, Bacterial , Transcription, Genetic , Binding Sites/genetics , Promoter Regions, Genetic , Regulon , Synteny , Transcription Factors/genetics
12.
Genome Res ; 13(11): 2435-43, 2003 Nov.
Article in English | MEDLINE | ID: mdl-14597655

ABSTRACT

The transcriptional network of Escherichia coli may well be the most complete experimentally characterized network of a single cell. A rule-based approach was built to assess the degree of consistency between whole-genome microarray experiments in different experimental conditions and the accumulated knowledge in the literature compiled in RegulonDB, a data base of transcriptional regulation and operon organization in E. coli. We observed a high and statistical significant level of consistency, ranging from 70%-87%. When effector metabolites of regulatory proteins are not considered in the prediction of the active or inactive state of the regulators, consistency falls by up to 40%. Similarly, consistency decreases when rules for multiple regulatory interactions are altered or when "on" and "off" entries were assigned randomly. We modified the initial state of regulators and evaluated the propagation of errors in the network that do not correlate linearly with the connectivity of regulators. We interpret this deviation mainly as a result of the existence of redundant regulatory interactions. Consistency evaluation opens a new space of dialogue between theory and experiment, as the consequences of different assumptions can be evaluated and compared.


Subject(s)
Escherichia coli/genetics , Gene Expression Profiling , Gene Expression Regulation, Bacterial/genetics , Oligonucleotide Array Sequence Analysis , Research Design , Databases, Genetic , Gene Expression Profiling/statistics & numerical data , Models, Genetic , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Operon/genetics , Predictive Value of Tests , Regulon/genetics
13.
J Mol Biol ; 333(2): 261-78, 2003 Oct 17.
Article in English | MEDLINE | ID: mdl-14529615

ABSTRACT

We present here a computational analysis showing that sigma70 house-keeping promoters are located within zones with high densities of promoter-like signals in Escherichia coli, and we introduce strategies that allow for the correct computer prediction of sigma70 promoters. Based on 599 experimentally verified promoters of E.coli K-12, we generated and evaluated more than 200 weight matrices optimizing different criteria to obtain the best recognition matrices. The alignments generating the best statistical models did not fully correspond with the canonical sigma70 model. However, matrices that correspond to such a canonical model performed better as tools for prediction. We tested the predictive capacity of these matrices on 250 bp long regions upstream of gene starts, where 90% of the known promoters occur. The computational matrix models generated an average of 38 promoter-like signals within each 250 bp region. In more than 50% of the cases, the true promoter does not have the best score within the region. We observed, in fact, that real promoters occur mostly within regions with high densities of overlapping putative promoters. We evaluated several strategies to identify promoters. The best one uses an intrinsic score of the -10 and -35 hexamers that form the promoter as well as an extrinsic score that uses the distribution of promoters from the start of the gene. We were able to identify 86% true promoters correctly, generating an average of 4.7 putative promoters per region as output, of which 3.7, on average, exist in clusters, as a series of overlapping potentially competing RNA polymerase-binding sites. As far as we know, this is the highest predictive capability reported so far. This high signal density is found mainly within regions upstream of genes, contrasting with coding regions and regions located between convergently transcribed genes. These results are consistent with experimental evidence that show the existence of multiple overlapping promoter sites that become functional under particular conditions. This density is probably the consequence of a rich number of vestiges of promoters in evolution. We suggest that transcriptional regulators as well as other functional promoters play an important role in keeping these latent signals suppressed.


Subject(s)
Bacterial Proteins/genetics , DNA-Directed RNA Polymerases/genetics , Escherichia coli/enzymology , Promoter Regions, Genetic , Sigma Factor/genetics , Transcription, Genetic , Bacterial Proteins/metabolism , Conserved Sequence , DNA-Directed RNA Polymerases/metabolism , Gene Expression Regulation, Bacterial , Genes, Bacterial , Genes, Overlapping , Genetic Variation , Sigma Factor/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL