Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Redefining fundamental concepts of transcription initiation in bacteria.

Mejía-Almonte, Citlalli; Busby, Stephen J W; Wade, Joseph T; van Helden, Jacques; Arkin, Adam P; Stormo, Gary D; Eilbeck, Karen; Palsson, Bernhard O; Galagan, James E; Collado-Vides, Julio.

Nat Rev Genet ; 21(11): 699-714, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32665585

RESUMO

Despite enormous progress in understanding the fundamentals of bacterial gene regulation, our knowledge remains limited when compared with the number of bacterial genomes and regulatory systems to be discovered. Derived from a small number of initial studies, classic definitions for concepts of gene regulation have evolved as the number of characterized promoters has increased. Together with discoveries made using new technologies, this knowledge has led to revised generalizations and principles. In this Expert Recommendation, we suggest precise, updated definitions that support a logical, consistent conceptual framework of bacterial gene regulation, focusing on transcription initiation. The resulting concepts can be formalized by ontologies for computational modelling, laying the foundation for improved bioinformatics tools, knowledge-based resources and scientific communication. Thus, this work will help researchers construct better predictive models, with different formalisms, that will be useful in engineering, synthetic biology, microbiology and genetics.

Assuntos

Bactérias/genética , Regulação Bacteriana da Expressão Gênica , Iniciação da Transcrição Genética , Óperon , Regiões Promotoras Genéticas , Regulon , Fatores de Transcrição/fisiologia

2.

RegulonDB v12.0: a comprehensive resource of transcriptional regulation in E. coli K-12.

Salgado, Heladia; Gama-Castro, Socorro; Lara, Paloma; Mejia-Almonte, Citlalli; Alarcón-Carranza, Gabriel; López-Almazo, Andrés G; Betancourt-Figueroa, Felipe; Peña-Loredo, Pablo; Alquicira-Hernández, Shirley; Ledezma-Tejeida, Daniela; Arizmendi-Zagal, Lizeth; Mendez-Hernandez, Francisco; Diaz-Gomez, Ana K; Ochoa-Praxedis, Elizabeth; Muñiz-Rascado, Luis J; García-Sotelo, Jair S; Flores-Gallegos, Fanny A; Gómez, Laura; Bonavides-Martínez, César; Del Moral-Chávez, Víctor M; Hernández-Alvarez, Alfredo J; Santos-Zavaleta, Alberto; Capella-Gutierrez, Salvador; Gelpi, Josep Lluis; Collado-Vides, Julio.

Nucleic Acids Res ; 52(D1): D255-D264, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971353

RESUMO

RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12 , Regulação Bacteriana da Expressão Gênica , Biologia Computacional/métodos , Escherichia coli K12/genética , Internet , Transcrição Gênica

3.

Regulatory promoter architectures in the hands of thermodynamic modelling.

Collado-Vides, Julio.

Nat Rev Genet ; 24(6): 349, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-36747003

Assuntos

Fatores de Transcrição , Regiões Promotoras Genéticas , Fatores de Transcrição/genética , Termodinâmica

4.

Limits to a classic paradigm: most transcription factors in E. coli regulate genes involved in multiple biological processes.

Ledezma-Tejeida, Daniela; Altamirano-Pacheco, Luis; Fajardo, Vicente; Collado-Vides, Julio.

Nucleic Acids Res ; 47(13): 6656-6667, 2019 07 26.

Artigo em Inglês | MEDLINE | ID: mdl-31194874

RESUMO

Transcription factors (TFs) are important drivers of cellular decision-making. When bacteria encounter a change in the environment, TFs alter the expression of a defined set of genes in order to adequately respond. It is commonly assumed that genes regulated by the same TF are involved in the same biological process. Examples of this are methods that rely on coregulation to infer function of not-yet-annotated genes. We have previously shown that only 21% of TFs involved in metabolism regulate functionally homogeneous genes, based on the proximity of the gene products' catalyzed reactions in the metabolic network. Here, we provide more evidence to support the claim that a 1-TF/1-process relationship is not a general property. We show that the observed functional heterogeneity of regulons is not a result of the quality of the annotation of regulatory interactions, nor the absence of protein-metabolite interactions, and that it is also present when function is defined by Gene Ontology terms. Furthermore, the observed functional heterogeneity is different from the one expected by chance, supporting the notion that it is a biological property. To further explore the relationship between transcriptional regulation and metabolism, we analyzed five other types of regulatory groups and identified complex regulons (i.e. genes regulated by the same combination of TFs) as the most functionally homogeneous, and this is supported by coexpression data. Whether higher levels of related functions exist beyond metabolism and current functional annotations remains an open question.

Assuntos

Proteínas de Escherichia coli/fisiologia , Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes/fisiologia , Regulon/fisiologia , Fatores de Transcrição/fisiologia , Enzimas/genética , Enzimas/fisiologia , Escherichia coli/genética , Escherichia coli/metabolismo , Ontologia Genética , Redes Reguladoras de Genes/genética , Redes e Vias Metabólicas , Regulon/genética

5.

RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12.

Santos-Zavaleta, Alberto; Salgado, Heladia; Gama-Castro, Socorro; Sánchez-Pérez, Mishael; Gómez-Romero, Laura; Ledezma-Tejeida, Daniela; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Muñiz-Rascado, Luis José; Peña-Loredo, Pablo; Ishida-Gutiérrez, Cecilia; Velázquez-Ramírez, David A; Del Moral-Chávez, Víctor; Bonavides-Martínez, César; Méndez-Cruz, Carlos-Francisco; Galagan, James; Collado-Vides, Julio.

Nucleic Acids Res ; 47(D1): D212-D220, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30395280

RESUMO

RegulonDB, first published 20 years ago, is a comprehensive electronic resource about regulation of transcription initiation of Escherichia coli K-12 with decades of knowledge from classic molecular biology experiments, and recently also from high-throughput genomic methodologies. We curated the literature to keep RegulonDB up to date, and initiated curation of ChIP and gSELEX experiments. We estimate that current knowledge describes between 10% and 30% of the expected total number of transcription factor- gene regulatory interactions in E. coli. RegulonDB provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. We developed a proof of concept pipeline to merge binding and expression evidence to identify regulatory interactions. These datasets can be visualized in the RegulonDB JBrowse. We developed the Microbial Conditions Ontology with a controlled vocabulary for the minimal properties to reproduce an experiment, which contributes to integrate data from high throughput and classic literature. At a higher level of integration, we report Genetic Sensory-Response Units for 200 transcription factors, including their regulation at the metabolic level, and include summaries for 70 of them. Finally, we summarize our research with Natural language processing strategies to enhance our biocuration work.

Assuntos

Biologia Computacional/métodos , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Genômica , Ontologia Genética , Redes Reguladoras de Genes , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala

6.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.

Keseler, Ingrid M; Mackie, Amanda; Santos-Zavaleta, Alberto; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Fulcher, Carol; Gama-Castro, Socorro; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Peralta-Gil, Martin; Subhraveti, Pallavi; Velázquez-Ramírez, David A; Weaver, Daniel; Collado-Vides, Julio; Paulsen, Ian; Karp, Peter D.

Nucleic Acids Res ; 45(D1): D543-D550, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899573

RESUMO

EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Metabolismo Energético , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Redes e Vias Metabólicas , Transdução de Sinais , Software , Fatores de Transcrição/metabolismo , Navegador

7.

A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0.

Santos-Zavaleta, Alberto; Sánchez-Pérez, Mishael; Salgado, Heladia; Velázquez-Ramírez, David A; Gama-Castro, Socorro; Tierrafría, Víctor H; Busby, Stephen J W; Aquino, Patricia; Fang, Xin; Palsson, Bernhard O; Galagan, James E; Collado-Vides, Julio.

BMC Biol ; 16(1): 91, 2018 08 16.

Artigo em Inglês | MEDLINE | ID: mdl-30115066

RESUMO

BACKGROUND: Our understanding of the regulation of gene expression has benefited from the availability of high-throughput technologies that interrogate the whole genome for the binding of specific transcription factors and gene expression profiles. In the case of widely used model organisms, such as Escherichia coli K-12, the new knowledge gained from these approaches needs to be integrated with the legacy of accumulated knowledge from genetic and molecular biology experiments conducted in the pre-genomic era in order to attain the deepest level of understanding possible based on the available data. RESULTS: In this paper, we describe an expansion of RegulonDB, the database containing the rich legacy of decades of classic molecular biology experiments supporting what we know about gene regulation and operon organization in E. coli K-12, to include the genome-wide dataset collections from 32 ChIP and 19 gSELEX publications, in addition to around 60 genome-wide expression profiles relevant to the functional significance of these datasets and used in their curation. Three essential features for the integration of this information coming from different methodological approaches are: first, a controlled vocabulary within an ontology for precisely defining growth conditions; second, the criteria to separate elements with enough evidence to consider them involved in gene regulation from isolated transcription factor binding sites without such support; and third, an expanded computational model supporting this knowledge. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration needed to manage and access such wealth of knowledge. CONCLUSIONS: This version 10.0 of RegulonDB is a first step toward what should become the unifying access point for current and future knowledge on gene regulation in E. coli K-12. Furthermore, this model platform and associated methodologies and criteria can be emulated for gathering knowledge on other microbial organisms.

Assuntos

Bases de Dados como Assunto , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Transcrição Gênica

8.

COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses.

Moretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas; Brilli, Matteo; Bianco, Luca; Ledezma-Tejeida, Daniela; Gama-Castro, Socorro; Galardini, Marco; Romualdi, Chiara; Laukens, Kris; Collado-Vides, Julio; Meysman, Pieter; Engelen, Kristof.

Nucleic Acids Res ; 44(D1): D620-3, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26586805

RESUMO

COLOMBOS is a database that integrates publicly available transcriptomics data for several prokaryotic model organisms. Compared to the previous version it has more than doubled in size, both in terms of species and data available. The manually curated condition annotation has been overhauled as well, giving more complete information about samples' experimental conditions and their differences. Functionality-wise cross-species analyses now enable users to analyse expression data for all species simultaneously, and identify candidate genes with evolutionary conserved expression behaviour. All the expression-based query tools have undergone a substantial improvement, overcoming the limit of enforced co-expression data retrieval and instead enabling the return of more complex patterns of expression behaviour. COLOMBOS is freely available through a web application at http://colombos.net/. The complete database is also accessible via REST API or downloadable as tab-delimited text files.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Archaea/genética , Archaea/metabolismo , Bactérias/genética , Bactérias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Software

9.

RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond.

Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Del Moral-Chávez, Víctor; Rinaldi, Fabio; Collado-Vides, Julio.

Nucleic Acids Res ; 44(D1): D133-43, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26527724

RESUMO

RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for 'neighborhood' genes to known operons and regulons, and computational developments.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Regulon , Análise por Conglomerados , Escherichia coli K12/metabolismo , Redes Reguladoras de Genes , Óperon , Matrizes de Pontuação de Posição Específica , Pequeno RNA não Traduzido/metabolismo , Fatores de Transcrição/classificação

10.

COLOMBOS v2.0: an ever expanding collection of bacterial expression compendia.

Meysman, Pieter; Sonego, Paolo; Bianco, Luca; Fu, Qiang; Ledezma-Tejeida, Daniela; Gama-Castro, Socorro; Liebens, Veerle; Michiels, Jan; Laukens, Kris; Marchal, Kathleen; Collado-Vides, Julio; Engelen, Kristof.

Nucleic Acids Res ; 42(Database issue): D649-53, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24214998

RESUMO

The COLOMBOS database (http://www.colombos.net) features comprehensive organism-specific cross-platform gene expression compendia of several bacterial model organisms and is supported by a fully interactive web portal and an extensive web API. COLOMBOS was originally published in PLoS One, and COLOMBOS v2.0 includes both an update of the expression data, by expanding the previously available compendia and by adding compendia for several new species, and an update of the surrounding functionality, with improved search and visualization options and novel tools for programmatic access to the database. The scope of the database has also been extended to incorporate RNA-seq data in our compendia by a dedicated analysis pipeline. We demonstrate the validity and robustness of this approach by comparing the same RNA samples measured in parallel using both microarrays and RNA-seq. As far as we know, COLOMBOS currently hosts the largest homogenized gene expression compendia available for seven bacterial model organisms.

Assuntos

Bactérias/genética , Bases de Dados Genéticas , Expressão Gênica , Bactérias/metabolismo , Perfilação da Expressão Gênica , Internet , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA

11.

EcoCyc: fusing model organism databases with systems biology.

Keseler, Ingrid M; Mackie, Amanda; Peralta-Gil, Martin; Santos-Zavaleta, Alberto; Gama-Castro, Socorro; Bonavides-Martínez, César; Fulcher, Carol; Huerta, Araceli M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Schröder, Imke; Shearer, Alexander G; Subhraveti, Pallavi; Travers, Mike; Weerasinghe, Deepika; Weiss, Verena; Collado-Vides, Julio; Gunsalus, Robert P; Paulsen, Ian; Karp, Peter D.

Nucleic Acids Res ; 41(Database issue): D605-12, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23143106

RESUMO

EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/genética , Sítios de Ligação , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/classificação , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Internet , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/metabolismo , Modelos Genéticos , Anotação de Sequência Molecular , Fenótipo , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Biologia de Sistemas , Fatores de Transcrição/metabolismo , Transcrição Gênica

12.

RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.

Salgado, Heladia; Peralta-Gil, Martin; Gama-Castro, Socorro; Santos-Zavaleta, Alberto; Muñiz-Rascado, Luis; García-Sotelo, Jair S; Weiss, Verena; Solano-Lira, Hilda; Martínez-Flores, Irma; Medina-Rivera, Alejandra; Salgado-Osorio, Gerardo; Alquicira-Hernández, Shirley; Alquicira-Hernández, Kevin; López-Fuentes, Alejandra; Porrón-Sotelo, Liliana; Huerta, Araceli M; Bonavides-Martínez, César; Balderas-Martínez, Yalbi I; Pannier, Lucia; Olvera, Maricela; Labastida, Aurora; Jiménez-Jacinto, Verónica; Vega-Alvarado, Leticia; Del Moral-Chávez, Victor; Hernández-Alvarez, Alfredo; Morett, Enrique; Collado-Vides, Julio.

Nucleic Acids Res ; 41(Database issue): D203-13, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23203884

RESUMO

This article summarizes our progress with RegulonDB (http://regulondb.ccg.unam.mx/) during the past 2 years. We have kept up-to-date the knowledge from the published literature regarding transcriptional regulation in Escherichia coli K-12. We have maintained and expanded our curation efforts to improve the breadth and quality of the encoded experimental knowledge, and we have implemented criteria for the quality of our computational predictions. Regulatory phrases now provide high-level descriptions of regulatory regions. We expanded the assignment of quality to various sources of evidence, particularly for knowledge generated through high-throughput (HT) technology. Based on our analysis of most relevant methods, we defined rules for determining the quality of evidence when multiple independent sources support an entry. With this latest release of RegulonDB, we present a new highly reliable larger collection of transcription start sites, a result of our experimental HT genome-wide efforts. These improvements, together with several novel enhancements (the tracks display, uploading format and curational guidelines), address the challenges of incorporating HT-generated knowledge into RegulonDB. Information on the evolutionary conservation of regulatory elements is also available now. Altogether, RegulonDB version 8.0 is a much better home for integrating knowledge on gene regulation from the sources of information currently available.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Elementos Reguladores de Transcrição , Transcrição Gênica , Proteínas de Bactérias/metabolismo , Bases de Dados Genéticas/normas , Evolução Molecular , Genômica , Internet , Regiões Promotoras Genéticas , Regulon , Proteínas Repressoras/metabolismo , Análise de Sequência de RNA , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição

13.

In silico identification and experimental characterization of regulatory elements controlling the expression of the Salmonella csrB and csrC genes.

Martínez, Luary C; Martínez-Flores, Irma; Salgado, Heladia; Fernández-Mora, Marcos; Medina-Rivera, Alejandra; Puente, José L; Collado-Vides, Julio; Bustamante, Víctor H.

J Bacteriol ; 196(2): 325-36, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24187088

RESUMO

The small RNAs CsrB and CsrC of Salmonella indirectly control the expression of numerous genes encoding widespread cellular functions, including virulence. The expression of csrB and csrC genes, which are located in different chromosomal regions, is coordinated by positive transcriptional control mediated by the two-component regulatory system BarA/SirA. Here, we identified by computational analysis an 18-bp inverted repeat (IR) sequence located far upstream from the promoter of Salmonella enterica serovar Typhimurium csrB and csrC genes. Deletion analysis and site-directed mutagenesis of the csrB and csrC regulatory regions revealed that this IR sequence is required for transcriptional activation of both genes. Protein-DNA and protein-protein interaction assays showed that the response regulator SirA specifically binds to the IR sequence and provide evidence that SirA acts as a dimer. Interestingly, whereas the IR sequence was essential for the SirA-mediated expression of csrB, our results revealed that SirA controls the expression of csrC not only by binding to the IR sequence but also by an indirect mode involving the Csr system. Additional computational, biochemical, and genetic analyses demonstrated that the integration host factor (IHF) global regulator positively controls the expression of csrB, but not of csrC, by interacting with a sequence located between the promoter and the SirA-binding site. These findings contribute to the better understanding of the regulatory mechanism controlling the expression of CsrB and CsrC.

Assuntos

Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Pequeno RNA não Traduzido/biossíntese , Elementos Reguladores de Transcrição , Salmonella typhimurium/genética , Proteínas de Bactérias/metabolismo , Biologia Computacional , Análise Mutacional de DNA , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Mutagênese Sítio-Dirigida , Ligação Proteica , Multimerização Proteica , Pequeno RNA não Traduzido/genética , Deleção de Sequência , Transativadores/metabolismo

14.

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types.

Lara, Paloma; Gama-Castro, Socorro; Salgado, Heladia; Rioualen, Claire; Tierrafría, Víctor H; Muñiz-Rascado, Luis J; Bonavides-Martínez, César; Collado-Vides, Julio.

Front Genet ; 15: 1353553, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38505828

RESUMO

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

15.

Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli.

Lally, Patrick; Gómez-Romero, Laura; Tierrafría, Víctor H; Aquino, Patricia; Rioualen, Claire; Zhang, Xiaoman; Kim, Sunyoung; Baniulyte, Gabriele; Plitnick, Jonathan; Smith, Carol; Babu, Mohan; Collado-Vides, Julio; Wade, Joseph T; Galagan, James E.

bioRxiv ; 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38826350

RESUMO

The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.

16.

Theoretical and empirical quality assessment of transcription factor-binding motifs.

Medina-Rivera, Alejandra; Abreu-Goodger, Cei; Thomas-Chollier, Morgane; Salgado, Heladia; Collado-Vides, Julio; van Helden, Jacques.

Nucleic Acids Res ; 39(3): 808-24, 2011 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-20923783

RESUMO

Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program 'matrix-quality', that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied 'matrix-quality' to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP-seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.

Assuntos

Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo , Animais , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Imunoprecipitação da Cromatina , Genômica , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Curva ROC , Proteínas Repressoras/metabolismo , Serina Endopeptidases/metabolismo , Software

17.

EcoCyc: a comprehensive database of Escherichia coli biology.

Keseler, Ingrid M; Collado-Vides, Julio; Santos-Zavaleta, Alberto; Peralta-Gil, Martin; Gama-Castro, Socorro; Muñiz-Rascado, Luis; Bonavides-Martinez, César; Paley, Suzanne; Krummenacker, Markus; Altman, Tomer; Kaipa, Pallavi; Spaulding, Aaron; Pacheco, John; Latendresse, Mario; Fulcher, Carol; Sarker, Malabika; Shearer, Alexander G; Mackie, Amanda; Paulsen, Ian; Gunsalus, Robert P; Karp, Peter D.

Nucleic Acids Res ; 39(Database issue): D583-90, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21097882

RESUMO

EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/fisiologia , Sítios de Ligação , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Transdução de Sinais , Software , Fatores de Transcrição/metabolismo , Transcrição Gênica , Interface Usuário-Computador

18.

RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units).

Gama-Castro, Socorro; Salgado, Heladia; Peralta-Gil, Martin; Santos-Zavaleta, Alberto; Muñiz-Rascado, Luis; Solano-Lira, Hilda; Jimenez-Jacinto, Verónica; Weiss, Verena; García-Sotelo, Jair S; López-Fuentes, Alejandra; Porrón-Sotelo, Liliana; Alquicira-Hernández, Shirley; Medina-Rivera, Alejandra; Martínez-Flores, Irma; Alquicira-Hernández, Kevin; Martínez-Adame, Ruth; Bonavides-Martínez, César; Miranda-Ríos, Juan; Huerta, Araceli M; Mendoza-Vargas, Alfredo; Collado-Torres, Leonardo; Taboada, Blanca; Vega-Alvarado, Leticia; Olvera, Maricela; Olvera, Leticia; Grande, Ricardo; Morett, Enrique; Collado-Vides, Julio.

Nucleic Acids Res ; 39(Database issue): D98-105, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21051347

RESUMO

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database of the best-known regulatory network of any free-living organism, that of Escherichia coli K-12. The major conceptual change since 3 years ago is an expanded biological context so that transcriptional regulation is now part of a unit that initiates with the signal and continues with the signal transduction to the core of regulation, modifying expression of the affected target genes responsible for the response. We call these genetic sensory response units, or Gensor Units. We have initiated their high-level curation, with graphic maps and superreactions with links to other databases. Additional connectivity uses expandable submaps. RegulonDB has summaries for every transcription factor (TF) and TF-binding sites with internal symmetry. Several DNA-binding motifs and their sizes have been redefined and relocated. In addition to data from the literature, we have incorporated our own information on transcription start sites (TSSs) and transcriptional units (TUs), obtained by using high-throughput whole-genome sequencing technologies. A new portable drawing tool for genomic features is also now available, as well as new ways to download the data, including web services, files for several relational database manager systems and text files including BioPAX format.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Sítios de Ligação , Escherichia coli K12/metabolismo , Transdução de Sinais , Integração de Sistemas , Sítio de Iniciação de Transcrição , Transcrição Gênica

19.

A Gold Standard for Transcription Factor Regulatory Interactions in Escherichia coli K-12: Architecture of Evidence Types.

Lara, Paloma; Gama-Castro, Socorro; Salgado, Heladia; Rioualen, Claire; Tierrafría, Víctor H; Muñiz-Rascado, Luis J; Bonavides-Martínez, César; Collado-Vides, Julio.

bioRxiv ; 2023 Dec 11.

Artigo em Inglês | MEDLINE | ID: mdl-37163020

RESUMO

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. As new methodologies emerge, a natural step is to compare their results with those from established methodologies, such as the classic methods of molecular biology used to characterize transcription factor binding sites, promoters, or transcription units. In the case of Escherichia coli K-12, the best-studied microorganism, for the last 30 years we have continuously gathered such knowledge from original scientific publications, and have organized it in two databases, RegulonDB and EcoCyc. Furthermore, since RegulonDB version 11.0 (1), we offer comprehensive datasets of binding sites from chromatin immunoprecipitation combined with sequencing (ChIP-seq), ChIP combined with exonuclease digestion and next-generation sequencing (ChIP-exo), genomic SELEX screening (gSELEX), and DNA affinity purification sequencing (DAP-seq) HT technologies, as well as additional datasets for transcription start sites, transcription units and RNA sequencing (RNA-seq) expression profiles. Here, we present for the first time an analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of E. coli K-12. An RI is formed by the transcription factor, its positive or negative effect on a promoter, a gene or transcription unit. We improved the evidence codes so that the specific methods are described, and we classified them into seven independent groups. This is the basis for our updated computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. We compare the confidence levels of the RI collection before and after adding HT evidence illustrating how knowledge will change as more HT data and methods appear in the future. Users can generate subsets filtering out the method they want to benchmark and avoid circularity, or keep for instance only the confirmed interactions. The comparison of different HT methods with the available datasets indicate that ChIP-seq recovers the highest fraction (>70%) of binding sites present in RegulonDB followed by gSELEX, DAP-seq and ChIP-exo. There is no other genomic database that offers this comprehensive high-quality anatomy of evidence supporting a corpus of transcriptional regulatory interactions.

20.

The EcoCyc Database (2023).

Karp, Peter D; Paley, Suzanne; Caspi, Ron; Kothari, Anamika; Krummenacker, Markus; Midford, Peter E; Moore, Lisa R; Subhraveti, Pallavi; Gama-Castro, Socorro; Tierrafria, Victor H; Lara, Paloma; Muñiz-Rascado, Luis; Bonavides-Martinez, César; Santos-Zavaleta, Alberto; Mackie, Amanda; Sun, Gwanggyu; Ahn-Horst, Travis A; Choi, Heejo; Covert, Markus W; Collado-Vides, Julio; Paulsen, Ian.

EcoSal Plus ; 11(1): eesp00022023, 2023 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-37220074

RESUMO

EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.

Assuntos

Escherichia coli K12 , Proteínas de Escherichia coli , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli K12/genética , Bases de Dados Genéticas , Software , Biologia Computacional , Proteínas de Escherichia coli/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA