RESUMO
EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.
Assuntos
Bases de Dados Genéticas , Escherichia coli K12/genética , Sítios de Ligação , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/classificação , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Internet , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/metabolismo , Modelos Genéticos , Anotação de Sequência Molecular , Fenótipo , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Biologia de Sistemas , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30,000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups.
Assuntos
Bases de Dados Factuais , Enzimas/metabolismo , Genômica , Redes e Vias Metabólicas , Metabolismo Energético , Genoma , Internet , Metabolômica , SoftwareRESUMO
BACKGROUND: As more complete genome sequences become available, bioinformatics challenges arise in how to exploit genome sequences to make phenotypic predictions. One type of phenotypic prediction is to determine sets of compounds that will support the growth of a bacterium from the metabolic network inferred from the genome sequence of that organism. RESULTS: We present a method for computationally determining alternative growth media for an organism based on its metabolic network and transporter complement. Our method predicted 787 alternative anaerobic minimal nutrient sets for Escherichia coli K-12 MG1655 from the EcoCyc database. The program automatically partitioned the nutrients within these sets into 21 equivalence classes, most of which correspond to compounds serving as sources of carbon, nitrogen, phosphorous, and sulfur, or combinations of these essential elements. The nutrient sets were predicted with 72.5% accuracy as evaluated by comparison with 91 growth experiments. Novel aspects of our approach include (a) exhaustive consideration of all combinations of nutrients rather than assuming that all element sources can substitute for one another(an assumption that can be invalid in general) (b) leveraging the notion of a machinery-duplicating constraint, namely, that all intermediate metabolites used in active reactions must be produced in increasing concentrations to prevent successive dilution from cell division, (c) the use of Satisfiability Modulo Theory solvers rather than Linear Programming solvers, because our approach cannot be formulated as linear programming, (d) the use of Binary Decision Diagrams to produce an efficient implementation. CONCLUSIONS: Our method for generating minimal nutrient sets from the metabolic network and transporters of an organism combines linear constraint solving with binary decision diagrams to efficiently produce solution sets to provided growth problems.
Assuntos
Algoritmos , Meios de Cultura , Redes e Vias Metabólicas , Biologia Computacional/métodos , Escherichia coli K12/genética , Escherichia coli K12/crescimento & desenvolvimento , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/metabolismo , Genômica , Proteínas de Membrana Transportadoras/metabolismo , Modelos BiológicosRESUMO
EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed.
Assuntos
Bases de Dados Genéticas , Escherichia coli K12/fisiologia , Sítios de Ligação , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Transdução de Sinais , Software , Fatores de Transcrição/metabolismo , Transcrição Gênica , Interface Usuário-ComputadorRESUMO
MOTIVATION: Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways. RESULTS: We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings. AVAILABILITY: Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html. CONTACT: lferrer@ai.sri.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Escherichia coli K12/genética , Estudos de Associação Genética/métodos , Redes e Vias Metabólicas/genética , Algoritmos , Biologia Computacional , Genoma , SoftwareRESUMO
The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Genoma de Planta , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Biológicos , Estrutura Terciária de Proteína , SoftwareRESUMO
MetaCyc (MetaCyc.org) is a universal database of metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are curated from the primary scientific literature, and are experimentally determined small-molecule metabolic pathways. Each reaction in a MetaCyc pathway is annotated with one or more well-characterized enzymes. Because MetaCyc contains only experimentally elucidated knowledge, it provides a uniquely high-quality resource for metabolic pathways and enzymes. BioCyc (BioCyc.org) is a collection of more than 350 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the predicted metabolic network of one organism, including metabolic pathways, enzymes, metabolites and reactions predicted by the Pathway Tools software using MetaCyc as a reference database. BioCyc PGDBs also contain predicted operons and predicted pathway hole fillers-predictions of which enzymes may catalyze pathway reactions that have not been assigned to an enzyme. The BioCyc website offers many tools for computational analysis of PGDBs, including comparative analysis and analysis of omics data in a pathway context. The BioCyc PGDBs generated by SRI are offered for adoption by any interested party for the ongoing integration of metabolic and genome-related information about an organism.
Assuntos
Bases de Dados Genéticas , Enzimas/metabolismo , Genômica , Redes e Vias Metabólicas , Animais , Archaea/enzimologia , Archaea/genética , Bactérias/enzimologia , Bactérias/genética , Biologia Computacional , Fungos/enzimologia , Fungos/genética , Internet , Redes e Vias Metabólicas/genética , Plantas/enzimologia , Plantas/genética , Software , Interface Usuário-ComputadorRESUMO
Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These 'orphan enzymes' represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes.
Assuntos
Sequência de Bases/genética , Enzimas/genética , Fases de Leitura Aberta/genética , Bases de Dados GenéticasRESUMO
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Assuntos
Bases de Dados de Proteínas , Enzimas/química , Enzimas/genética , Análise de Sequência de Proteína/métodos , Catálise , Espectrometria de Massas , Anotação de Sequência MolecularRESUMO
Cellular quality control requires recognition of common features of misfolding, and so is not typically associated with the specific targeting of individual proteins. However, physiologically regulated degradation of yeast HMG-CoA reductase (Hmg2p) occurs by the HRD endoplasmic reticulum quality control pathway, implying that Hmg2p undergoes a regulated transition to a quality control substrate in response to a sterol pathway molecule. Using in vitro structural assays, we now show that the pathway derivative farnesol causes Hmg2p to undergo a change to a less folded structure. The effect is reversible, biologically relevant by numerous criteria, highly specific for farnesol structure, and requires an intact Hmg2p sterol-sensing domain. This represents a distinct lipid-sensing function for this highly conserved motif that suggests novel approaches to cholesterol management. More generally, our observation of reversible small-molecule-mediated misfolding may herald numerous examples of regulated quality control to be discovered in biology or applied in the clinic.
Assuntos
Farneseno Álcool/química , Farneseno Álcool/metabolismo , Hidroximetilglutaril-CoA Redutases/química , Hidroximetilglutaril-CoA Redutases/metabolismo , Metabolismo dos Lipídeos , Dobramento de Proteína , Hidroximetilglutaril-CoA Redutases/genética , Lipídeos/química , Microssomos/metabolismo , Estrutura Molecular , Desnaturação Proteica , Estrutura Secundária de Proteína , Proteínas Recombinantes de Fusão/química , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Tripsina/metabolismoRESUMO
The endoplasmic reticulum (ER) quality control pathway destroys misfolded and unassembled proteins in the ER. Most substrates of this ER-associated degradation (ERAD) pathway are constitutively targeted for destruction through recognition of poorly understood structural hallmarks of misfolding. However, the normal yeast ER membrane protein 3-hydroxy-3-methylglutaryl-CoA reductase (Hmg2p) undergoes ERAD that is physiologically regulated by sterol pathway signals. We have proposed that Hmg2p ERAD occurs by a regulated transition to an ERAD quality control substrate. Consistent with this, we had previously shown that Hmg2p is strongly stabilized by chemical chaperones such as glycerol, which stabilize misfolded proteins. To understand the features of Hmg2p that permit regulated ERAD, we have thoroughly characterized the effects of chemical chaperones on Hmg2p. These agents caused a reversible, immediate, direct change in Hmg2p degradation consistent with an effect on Hmg2p structure. We devised an in vitro limited proteolysis assay of Hmg2p in its native membranes. In vitro, chemical chaperones caused a dramatic, rapid change in Hmg2p structure to a less accessible form. As in the living cell, the in vitro action of chemical chaperones was highly specific for Hmg2p and completely reversible. To evaluate the physiological relevance of this model behavior, we used the limited proteolysis assay to examine the effects of changing in vivo degradation signals on Hmg2p structure. We found that changes similar to those observed with chemical chaperones were brought about by alteration of natural degradation signal. Thus, Hmg2p can undergo significant, reversible structural changes that are relevant to the physiological control of Hmg2p ERAD. These findings support the idea that Hmg2p regulation is brought about by regulated alteration of folding state. Considering the ubiquitous nature of quality control pathways in biology, it may be that this strategy of regulation is widespread.