Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli.

Lally, Patrick; Gómez-Romero, Laura; Tierrafría, Víctor H; Aquino, Patricia; Rioualen, Claire; Zhang, Xiaoman; Kim, Sunyoung; Baniulyte, Gabriele; Plitnick, Jonathan; Smith, Carol; Babu, Mohan; Collado-Vides, Julio; Wade, Joseph T; Galagan, James E.

bioRxiv ; 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38826350

RESUMO

The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.

2.

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types.

Lara, Paloma; Gama-Castro, Socorro; Salgado, Heladia; Rioualen, Claire; Tierrafría, Víctor H; Muñiz-Rascado, Luis J; Bonavides-Martínez, César; Collado-Vides, Julio.

Front Genet ; 15: 1353553, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38505828

RESUMO

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

3.

RegulonDB v12.0: a comprehensive resource of transcriptional regulation in E. coli K-12.

Salgado, Heladia; Gama-Castro, Socorro; Lara, Paloma; Mejia-Almonte, Citlalli; Alarcón-Carranza, Gabriel; López-Almazo, Andrés G; Betancourt-Figueroa, Felipe; Peña-Loredo, Pablo; Alquicira-Hernández, Shirley; Ledezma-Tejeida, Daniela; Arizmendi-Zagal, Lizeth; Mendez-Hernandez, Francisco; Diaz-Gomez, Ana K; Ochoa-Praxedis, Elizabeth; Muñiz-Rascado, Luis J; García-Sotelo, Jair S; Flores-Gallegos, Fanny A; Gómez, Laura; Bonavides-Martínez, César; Del Moral-Chávez, Víctor M; Hernández-Alvarez, Alfredo J; Santos-Zavaleta, Alberto; Capella-Gutierrez, Salvador; Gelpi, Josep Lluis; Collado-Vides, Julio.

Nucleic Acids Res ; 52(D1): D255-D264, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971353

RESUMO

RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12 , Regulação Bacteriana da Expressão Gênica , Biologia Computacional/métodos , Escherichia coli K12/genética , Internet , Transcrição Gênica

4.

The EcoCyc Database (2023).

Karp, Peter D; Paley, Suzanne; Caspi, Ron; Kothari, Anamika; Krummenacker, Markus; Midford, Peter E; Moore, Lisa R; Subhraveti, Pallavi; Gama-Castro, Socorro; Tierrafria, Victor H; Lara, Paloma; Muñiz-Rascado, Luis; Bonavides-Martinez, César; Santos-Zavaleta, Alberto; Mackie, Amanda; Sun, Gwanggyu; Ahn-Horst, Travis A; Choi, Heejo; Covert, Markus W; Collado-Vides, Julio; Paulsen, Ian.

EcoSal Plus ; 11(1): eesp00022023, 2023 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-37220074

RESUMO

EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.

Assuntos

Escherichia coli K12 , Proteínas de Escherichia coli , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli K12/genética , Bases de Dados Genéticas , Software , Biologia Computacional , Proteínas de Escherichia coli/metabolismo

5.

A Gold Standard for Transcription Factor Regulatory Interactions in Escherichia coli K-12: Architecture of Evidence Types.

Lara, Paloma; Gama-Castro, Socorro; Salgado, Heladia; Rioualen, Claire; Tierrafría, Víctor H; Muñiz-Rascado, Luis J; Bonavides-Martínez, César; Collado-Vides, Julio.

bioRxiv ; 2023 Dec 11.

Artigo em Inglês | MEDLINE | ID: mdl-37163020

RESUMO

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. As new methodologies emerge, a natural step is to compare their results with those from established methodologies, such as the classic methods of molecular biology used to characterize transcription factor binding sites, promoters, or transcription units. In the case of Escherichia coli K-12, the best-studied microorganism, for the last 30 years we have continuously gathered such knowledge from original scientific publications, and have organized it in two databases, RegulonDB and EcoCyc. Furthermore, since RegulonDB version 11.0 (1), we offer comprehensive datasets of binding sites from chromatin immunoprecipitation combined with sequencing (ChIP-seq), ChIP combined with exonuclease digestion and next-generation sequencing (ChIP-exo), genomic SELEX screening (gSELEX), and DNA affinity purification sequencing (DAP-seq) HT technologies, as well as additional datasets for transcription start sites, transcription units and RNA sequencing (RNA-seq) expression profiles. Here, we present for the first time an analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of E. coli K-12. An RI is formed by the transcription factor, its positive or negative effect on a promoter, a gene or transcription unit. We improved the evidence codes so that the specific methods are described, and we classified them into seven independent groups. This is the basis for our updated computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. We compare the confidence levels of the RI collection before and after adding HT evidence illustrating how knowledge will change as more HT data and methods appear in the future. Users can generate subsets filtering out the method they want to benchmark and avoid circularity, or keep for instance only the confirmed interactions. The comparison of different HT methods with the available datasets indicate that ChIP-seq recovers the highest fraction (>70%) of binding sites present in RegulonDB followed by gSELEX, DAP-seq and ChIP-exo. There is no other genomic database that offers this comprehensive high-quality anatomy of evidence supporting a corpus of transcriptional regulatory interactions.

6.

Regulatory promoter architectures in the hands of thermodynamic modelling.

Collado-Vides, Julio.

Nat Rev Genet ; 24(6): 349, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-36747003

Assuntos

Fatores de Transcrição , Regiões Promotoras Genéticas , Fatores de Transcrição/genética , Termodinâmica

7.

RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12.

Tierrafría, Víctor H; Rioualen, Claire; Salgado, Heladia; Lara, Paloma; Gama-Castro, Socorro; Lally, Patrick; Gómez-Romero, Laura; Peña-Loredo, Pablo; López-Almazo, Andrés G; Alarcón-Carranza, Gabriel; Betancourt-Figueroa, Felipe; Alquicira-Hernández, Shirley; Polanco-Morelos, J Enrique; García-Sotelo, Jair; Gaytan-Nuñez, Estefani; Méndez-Cruz, Carlos-Francisco; Muñiz, Luis J; Bonavides-Martínez, César; Moreno-Hagelsieb, Gabriel; Galagan, James E; Wade, Joseph T; Collado-Vides, Julio.

Microb Genom ; 8(5)2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-35584008

RESUMO

Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.

Assuntos

Escherichia coli K12 , Escherichia coli , Escherichia coli/genética , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Regulação Bacteriana da Expressão Gênica , Óperon/genética , Reprodutibilidade dos Testes

8.

Sensory Systems and Transcriptional Regulation in Escherichia coli.

Femerling, Georgette; Gama-Castro, Socorro; Lara, Paloma; Ledezma-Tejeida, Daniela; Tierrafría, Víctor H; Muñiz-Rascado, Luis; Bonavides-Martínez, César; Collado-Vides, Julio.

Front Bioeng Biotechnol ; 10: 823240, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35237580

RESUMO

In free-living bacteria, the ability to regulate gene expression is at the core of adapting and interacting with the environment. For these systems to have a logic, a signal must trigger a genetic change that helps the cell to deal with what implies its presence in the environment; briefly, the response is expected to include a feedback to the signal. Thus, it makes sense to think of genetic sensory mechanisms of gene regulation. Escherichia coli K-12 is the bacterium model for which the largest number of regulatory systems and its sensing capabilities have been studied in detail at the molecular level. In this special issue focused on biomolecular sensing systems, we offer an overview of the transcriptional regulatory corpus of knowledge for E. coli that has been gathered in our database, RegulonDB, from the perspective of sensing regulatory systems. Thus, we start with the beginning of the information flux, which is the signal's chemical or physical elements detected by the cell as changes in the environment; these signals are internally transduced to transcription factors and alter their conformation. Signals transduced to effectors bind allosterically to transcription factors, and this defines the dominant sensing mechanism in E. coli. We offer an updated list of the repertoire of known allosteric effectors, as well as a list of the currently known different mechanisms of this sensing capability. Our previous definition of elementary genetic sensory-response units, GENSOR units for short, that integrate signals, transport, gene regulation, and the biochemical response of the regulated gene products of a given transcriptional factor fit perfectly with the purpose of this overview. We summarize the functional heterogeneity of their response, based on our updated collection of GENSORs, and we use them to identify the expected feedback as part of their response. Finally, we address the question of multiple sensing in the regulatory network of E. coli. This overview introduces the architecture of sensing and regulation of native components in E.coli K-12, which might be a source of inspiration to bioengineering applications.

9.

Missing Links Between Gene Function and Physiology in Genomics.

Collado-Vides, Julio; Gaudet, Pascale; de Lorenzo, Víctor.

Front Physiol ; 13: 815874, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35295568

RESUMO

Knowledge of biological organisms at the molecular level that has been gathered is now organized into databases, often within ontological frameworks. To enable computational comparisons of annotations across different genomes and organisms, controlled vocabularies have been essential, as is the case in the functional annotation classifications used for bacteria, such as MultiFun and the more widely used Gene Ontology. The function of individual gene products as well as the processes in which collections of them participate constitute a wealth of classes that describe the biological role of gene products in a large number of organisms in the three kingdoms of life. In this contribution, we highlight from a qualitative perspective some limitations of these frameworks and discuss challenges that need to be addressed to bridge the gap between annotation as currently captured by ontologies and databases and our understanding of the basic principles in the organization and functioning of organisms; we illustrate these challenges with some examples in bacteria. We hope that raising awareness of these issues will encourage users of Gene Ontology and similar ontologies to be careful about data interpretation and lead to improved data representation.

10.

Lisen&Curate: A platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria.

Díaz-Rodríguez, Martín; Lithgow-Serrano, Oscar; Guadarrama-García, Francisco; Tierrafría, Víctor H; Gama-Castro, Socorro; Solano-Lira, Hilda; Salgado, Heladia; Rinaldi, Fabio; Méndez-Cruz, Carlos-Francisco; Collado-Vides, Julio.

Biochim Biophys Acta Gene Regul Mech ; 1864(11-12): 194753, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34461312

RESUMO

The number of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases contribute facilitating the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present Lisen&Curate, a curation system that facilitates linking sentences or part of sentences (both considered sources) in articles with their corresponding curated objects, so that rich additional information of these objects is easily available to users. These sources are going to be offered both within RegulonDB and a new database, L-Regulon. To show the relevance of our work, two senior curators performed a curation of 31 articles on the regulation of transcription initiation of E. coli using Lisen&Curate. As a result, 194 objects were curated and 781 sources were recorded. We also found that these sources are useful to develop automatic approaches to detect objects in articles by observing word frequency patterns and by carrying out an open information extraction task. Sources may help to elaborate a controlled vocabulary of experimental methods. Finally, we discuss our ecosystem of interconnected applications, RegulonDB, L-Regulon, and Lisen&Curate, to facilitate the access to knowledge on regulation of transcription initiation in bacteria. We see our proposal as the starting point to change the way experimentalists connect a piece of knowledge with its evidence using RegulonDB.

Assuntos

Curadoria de Dados/métodos , Bases de Dados Genéticas , Regulação Bacteriana da Expressão Gênica , Iniciação da Transcrição Genética , Escherichia coli/genética

11.

The EcoCyc Database in 2021.

Keseler, Ingrid M; Gama-Castro, Socorro; Mackie, Amanda; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Kothari, Anamika; Krummenacker, Markus; Midford, Peter E; Muñiz-Rascado, Luis; Ong, Wai Kit; Paley, Suzanne; Santos-Zavaleta, Alberto; Subhraveti, Pallavi; Tierrafría, Víctor H; Wolfe, Alan J; Collado-Vides, Julio; Paulsen, Ian T; Karp, Peter D.

Front Microbiol ; 12: 711077, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34394059

RESUMO

The EcoCyc model-organism database collects and summarizes experimental data for Escherichia coli K-12. EcoCyc is regularly updated by the manual curation of individual database entries, such as genes, proteins, and metabolic pathways, and by the programmatic addition of results from select high-throughput analyses. Updates to the Pathway Tools software that supports EcoCyc and to the web interface that enables user access have continuously improved its usability and expanded its functionality. This article highlights recent improvements to the curated data in the areas of metabolism, transport, DNA repair, and regulation of gene expression. New and revised data analysis and visualization tools include an interactive metabolic network explorer, a circular genome viewer, and various improvements to the speed and usability of existing tools.

12.

Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties.

Méndez-Cruz, Carlos-Francisco; Blanchet, Antonio; Godínez, Alan; Arroyo-Fernández, Ignacio; Gama-Castro, Socorro; Martínez-Luna, Sara Berenice; González-Colín, Cristian; Collado-Vides, Julio.

Database (Oxford) ; 20202020 12 11.

Artigo em Inglês | MEDLINE | ID: mdl-33306798

RESUMO

Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).

Assuntos

Escherichia coli K12 , Fatores de Transcrição , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli K12/metabolismo , Regulação da Expressão Gênica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica

13.

Redefining fundamental concepts of transcription initiation in bacteria.

Mejía-Almonte, Citlalli; Busby, Stephen J W; Wade, Joseph T; van Helden, Jacques; Arkin, Adam P; Stormo, Gary D; Eilbeck, Karen; Palsson, Bernhard O; Galagan, James E; Collado-Vides, Julio.

Nat Rev Genet ; 21(11): 699-714, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32665585

RESUMO

Despite enormous progress in understanding the fundamentals of bacterial gene regulation, our knowledge remains limited when compared with the number of bacterial genomes and regulatory systems to be discovered. Derived from a small number of initial studies, classic definitions for concepts of gene regulation have evolved as the number of characterized promoters has increased. Together with discoveries made using new technologies, this knowledge has led to revised generalizations and principles. In this Expert Recommendation, we suggest precise, updated definitions that support a logical, consistent conceptual framework of bacterial gene regulation, focusing on transcription initiation. The resulting concepts can be formalized by ontologies for computational modelling, laying the foundation for improved bioinformatics tools, knowledge-based resources and scientific communication. Thus, this work will help researchers construct better predictive models, with different formalisms, that will be useful in engineering, synthetic biology, microbiology and genetics.

Assuntos

Bactérias/genética , Regulação Bacteriana da Expressão Gênica , Iniciação da Transcrição Genética , Óperon , Regiões Promotoras Genéticas , Regulon , Fatores de Transcrição/fisiologia

14.

PulmonDB: a curated lung disease gene expression database.

Villaseñor-Altamirano, Ana B; Moretto, Marco; Maldonado, Mariel; Zayas-Del Moral, Alejandra; Munguía-Reyes, Adrián; Romero, Yair; García-Sotelo, Jair S; Aguilar, Luis A; Aldana-Assad, Oscar; Engelen, Kristof; Selman, Moisés; Collado-Vides, Julio; Balderas-Martínez, Yalbi I; Medina-Rivera, Alejandra.

Sci Rep ; 10(1): 514, 2020 01 16.

Artigo em Inglês | MEDLINE | ID: mdl-31949184

RESUMO

Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF) have contrasting clinical and pathological characteristics and interesting whole-genome transcriptomic profiles. However, data from public repositories are difficult to reprocess and reanalyze. Here, we present PulmonDB, a web-based database (http://pulmondb.liigh.unam.mx/) and R library that facilitates exploration of gene expression profiles for these diseases by integrating transcriptomic data and curated annotation from different sources. We demonstrated the value of this resource by presenting the expression of already well-known genes of COPD and IPF across multiple experiments and the results of two differential expression analyses in which we successfully identified differences and similarities. With this first version of PulmonDB, we create a new hypothesis and compare the two diseases from a transcriptomics perspective.

Assuntos

Bases de Dados Genéticas , Redes Reguladoras de Genes , Fibrose Pulmonar Idiopática/genética , Doença Pulmonar Obstrutiva Crônica/genética , Curadoria de Dados , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Internet , Sequenciamento do Exoma

15.

Limits to a classic paradigm: most transcription factors in E. coli regulate genes involved in multiple biological processes.

Ledezma-Tejeida, Daniela; Altamirano-Pacheco, Luis; Fajardo, Vicente; Collado-Vides, Julio.

Nucleic Acids Res ; 47(13): 6656-6667, 2019 07 26.

Artigo em Inglês | MEDLINE | ID: mdl-31194874

RESUMO

Transcription factors (TFs) are important drivers of cellular decision-making. When bacteria encounter a change in the environment, TFs alter the expression of a defined set of genes in order to adequately respond. It is commonly assumed that genes regulated by the same TF are involved in the same biological process. Examples of this are methods that rely on coregulation to infer function of not-yet-annotated genes. We have previously shown that only 21% of TFs involved in metabolism regulate functionally homogeneous genes, based on the proximity of the gene products' catalyzed reactions in the metabolic network. Here, we provide more evidence to support the claim that a 1-TF/1-process relationship is not a general property. We show that the observed functional heterogeneity of regulons is not a result of the quality of the annotation of regulatory interactions, nor the absence of protein-metabolite interactions, and that it is also present when function is defined by Gene Ontology terms. Furthermore, the observed functional heterogeneity is different from the one expected by chance, supporting the notion that it is a biological property. To further explore the relationship between transcriptional regulation and metabolism, we analyzed five other types of regulatory groups and identified complex regulons (i.e. genes regulated by the same combination of TFs) as the most functionally homogeneous, and this is supported by coexpression data. Whether higher levels of related functions exist beyond metabolism and current functional annotations remains an open question.

Assuntos

Proteínas de Escherichia coli/fisiologia , Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes/fisiologia , Regulon/fisiologia , Fatores de Transcrição/fisiologia , Enzimas/genética , Enzimas/fisiologia , Escherichia coli/genética , Escherichia coli/metabolismo , Ontologia Genética , Redes Reguladoras de Genes/genética , Redes e Vias Metabólicas , Regulon/genética

16.

Similarity corpus on microbial transcriptional regulation.

Lithgow-Serrano, Oscar; Gama-Castro, Socorro; Ishida-Gutiérrez, Cecilia; Mejía-Almonte, Citlalli; Tierrafría, Víctor H; Martínez-Luna, Sara; Santos-Zavaleta, Alberto; Velázquez-Ramírez, David; Collado-Vides, Julio.

J Biomed Semantics ; 10(1): 8, 2019 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-31118102

RESUMO

BACKGROUND: The ability to express the same meaning in different ways is a well-known property of natural language. This amazing property is the source of major difficulties in natural language processing. Given the constant increase in published literature, its curation and information extraction would strongly benefit from efficient automatic processes, for which corpora of sentences evaluated by experts are a valuable resource. RESULTS: Given our interest in applying such approaches to the benefit of curation of the biomedical literature, specifically that about gene regulation in microbial organisms, we decided to build a corpus with graded textual similarity evaluated by curators and that was designed specifically oriented to our purposes. Based on the predefined statistical power of future analyses, we defined features of the design, including sampling, selection criteria, balance, and size, among others. A non-fully crossed study design was applied. Each pair of sentences was evaluated by 3 annotators from a total of 7; the scale used in the semantic similarity assessment task within the Semantic Evaluation workshop (SEMEVAL) was adapted to our goals in four successive iterative sessions with clear improvements in the agreed guidelines and interrater reliability results. Alternatives for such a corpus evaluation have been widely discussed. CONCLUSIONS: To the best of our knowledge, this is the first similarity corpus-a dataset of pairs of sentences for which human experts rate the semantic similarity of each pair-in this domain of knowledge. We have initiated its incorporation in our research towards high-throughput curation strategies based on natural language processing.

Assuntos

Regulação da Expressão Gênica , Microbiologia , Processamento de Linguagem Natural , Transcrição Gênica/genética

17.

Integrating Bacterial ChIP-seq and RNA-seq Data With SnakeChunks.

Rioualen, Claire; Charbonnier-Khamvongsa, Lucie; Collado-Vides, Julio; van Helden, Jacques.

Curr Protoc Bioinformatics ; 66(1): e72, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30786165

RESUMO

Next-generation sequencing (NGS) is becoming a routine approach in most domains of the life sciences. To ensure reproducibility of results, there is a crucial need to improve the automation of NGS data processing and enable forthcoming studies relying on big datasets. Although user-friendly interfaces now exist, there remains a strong need for accessible solutions that allow experimental biologists to analyze and explore their results in an autonomous and flexible way. The protocols here describe a modular system that enable a user to compose and fine-tune workflows based on SnakeChunks, a library of rules for the Snakemake workflow engine. They are illustrated using a study combining ChIP-seq and RNA-seq to identify target genes of the global transcription factor FNR in Escherichia coli, which has the advantage that results can be compared with the most up-to-date collection of existing knowledge about transcriptional regulation in this model organism, extracted from the RegulonDB database. © 2019 by John Wiley & Sons, Inc.

Assuntos

Bactérias/genética , Sequenciamento de Cromatina por Imunoprecipitação/métodos , RNA-Seq , Software , Sequência de Bases , Genoma Bacteriano , Motivos de Nucleotídeos/genética , Interface Usuário-Computador

18.

RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12.

Santos-Zavaleta, Alberto; Salgado, Heladia; Gama-Castro, Socorro; Sánchez-Pérez, Mishael; Gómez-Romero, Laura; Ledezma-Tejeida, Daniela; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Muñiz-Rascado, Luis José; Peña-Loredo, Pablo; Ishida-Gutiérrez, Cecilia; Velázquez-Ramírez, David A; Del Moral-Chávez, Víctor; Bonavides-Martínez, César; Méndez-Cruz, Carlos-Francisco; Galagan, James; Collado-Vides, Julio.

Nucleic Acids Res ; 47(D1): D212-D220, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30395280

RESUMO

RegulonDB, first published 20 years ago, is a comprehensive electronic resource about regulation of transcription initiation of Escherichia coli K-12 with decades of knowledge from classic molecular biology experiments, and recently also from high-throughput genomic methodologies. We curated the literature to keep RegulonDB up to date, and initiated curation of ChIP and gSELEX experiments. We estimate that current knowledge describes between 10% and 30% of the expected total number of transcription factor- gene regulatory interactions in E. coli. RegulonDB provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. We developed a proof of concept pipeline to merge binding and expression evidence to identify regulatory interactions. These datasets can be visualized in the RegulonDB JBrowse. We developed the Microbial Conditions Ontology with a controlled vocabulary for the minimal properties to reproduce an experiment, which contributes to integrate data from high throughput and classic literature. At a higher level of integration, we report Genetic Sensory-Response Units for 200 transcription factors, including their regulation at the metabolic level, and include summaries for 70 of them. Finally, we summarize our research with Natural language processing strategies to enhance our biocuration work.

Assuntos

Biologia Computacional/métodos , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Genômica , Ontologia Genética , Redes Reguladoras de Genes , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala

19.

The EcoCyc Database.

Karp, Peter D; Ong, Wai Kit; Paley, Suzanne; Billington, Richard; Caspi, Ron; Fulcher, Carol; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Midford, Peter E; Subhraveti, Pallavi; Gama-Castro, Socorro; Muñiz-Rascado, Luis; Bonavides-Martinez, César; Santos-Zavaleta, Alberto; Mackie, Amanda; Collado-Vides, Julio; Keseler, Ingrid M; Paulsen, Ian.

EcoSal Plus ; 8(1)2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30406744

RESUMO

EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.

Assuntos

Bases de Dados Genéticas , Escherichia coli K12/genética , Genoma Bacteriano , Software , Biologia Computacional , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Internet , Análise do Fluxo Metabólico , Redes e Vias Metabólicas/genética , Interface Usuário-Computador

20.

A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0.

Santos-Zavaleta, Alberto; Sánchez-Pérez, Mishael; Salgado, Heladia; Velázquez-Ramírez, David A; Gama-Castro, Socorro; Tierrafría, Víctor H; Busby, Stephen J W; Aquino, Patricia; Fang, Xin; Palsson, Bernhard O; Galagan, James E; Collado-Vides, Julio.

BMC Biol ; 16(1): 91, 2018 08 16.

Artigo em Inglês | MEDLINE | ID: mdl-30115066

RESUMO

BACKGROUND: Our understanding of the regulation of gene expression has benefited from the availability of high-throughput technologies that interrogate the whole genome for the binding of specific transcription factors and gene expression profiles. In the case of widely used model organisms, such as Escherichia coli K-12, the new knowledge gained from these approaches needs to be integrated with the legacy of accumulated knowledge from genetic and molecular biology experiments conducted in the pre-genomic era in order to attain the deepest level of understanding possible based on the available data. RESULTS: In this paper, we describe an expansion of RegulonDB, the database containing the rich legacy of decades of classic molecular biology experiments supporting what we know about gene regulation and operon organization in E. coli K-12, to include the genome-wide dataset collections from 32 ChIP and 19 gSELEX publications, in addition to around 60 genome-wide expression profiles relevant to the functional significance of these datasets and used in their curation. Three essential features for the integration of this information coming from different methodological approaches are: first, a controlled vocabulary within an ontology for precisely defining growth conditions; second, the criteria to separate elements with enough evidence to consider them involved in gene regulation from isolated transcription factor binding sites without such support; and third, an expanded computational model supporting this knowledge. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration needed to manage and access such wealth of knowledge. CONCLUSIONS: This version 10.0 of RegulonDB is a first step toward what should become the unifying access point for current and future knowledge on gene regulation in E. coli K-12. Furthermore, this model platform and associated methodologies and criteria can be emulated for gathering knowledge on other microbial organisms.

Assuntos

Bases de Dados como Assunto , Escherichia coli K12/genética , Regulação Bacteriana da Expressão Gênica , Transcrição Gênica

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA