RESUMO
MOTIVATION: Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. RESULTS: In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. AVAILABILITY: The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. CONTACT: pkarp@ai.sri.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.
Assuntos
Genômica/métodos , Metabolômica/métodos , Software/normas , Biologia de Sistemas/métodos , Animais , HumanosRESUMO
MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains 2749 pathways derived from more than 60 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc are evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in BioCyc.org and other genomic portals. This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1. Some of the topics that received intensive curation during this period include cobamides biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid metabolism, carotenoid metabolism, protein glycosylation, antibiotics and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence, vitamin K metabolism, brominated compound metabolism, plant secondary metabolism and human metabolism. Other additions include modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, improvements in the PathoLogic component of Pathway Tools, and the optional addition of regulatory information to pathway diagrams.
Assuntos
Bases de Dados Factuais , Genômica/métodos , Redes e Vias Metabólicas , Metabolômica/métodos , Software , Animais , Enzimas/genética , Enzimas/metabolismo , Humanos , Plantas/genética , Plantas/metabolismoRESUMO
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance.
Assuntos
Genoma Microbiano , Redes e Vias Metabólicas , Software , Biologia Computacional , Bases de Dados Genéticas , Escherichia coli/genética , Escherichia coli/metabolismo , Genômica , Internet , Modelos Biológicos , Ferramenta de BuscaRESUMO
MetaCyc (https://MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains more than 2570 pathways derived from >54 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc is strictly evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in the BioCyc (https://BioCyc.org) and other PGDB collections. This article provides an update on the developments in MetaCyc during the past two years, including the expansion of data and addition of new features.
Assuntos
Bases de Dados Factuais , Enzimas/metabolismo , Redes e Vias Metabólicas , Animais , Archaea/metabolismo , Bactérias/metabolismo , Curadoria de Dados , Bases de Dados de Compostos Químicos , Bases de Dados de Proteínas , Humanos , Internet , Filogenia , Plantas/metabolismo , Software , Especificidade da EspécieRESUMO
EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Metabolismo Energético , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Redes e Vias Metabólicas , Transdução de Sinais , Software , Fatores de Transcrição/metabolismo , NavegadorRESUMO
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms.
Assuntos
Genoma , Biologia Computacional , Genômica , Internet , Redes e Vias Metabólicas , Design de Software , Biologia de SistemasRESUMO
The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service.
Assuntos
Bases de Dados de Compostos Químicos , Enzimas/metabolismo , Redes e Vias Metabólicas , Bases de Dados Genéticas , Transporte de Elétrons , Genoma , Internet , Redes e Vias Metabólicas/genética , SoftwareRESUMO
The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37,000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states.
Assuntos
Bases de Dados de Compostos Químicos , Enzimas/metabolismo , Redes e Vias Metabólicas , Enzimas/química , Enzimas/classificação , Ontologia Genética , Genoma , Internet , Cinética , Redes e Vias Metabólicas/genética , Polissacarídeos/metabolismo , SoftwareRESUMO
EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.
Assuntos
Bases de Dados Genéticas , Escherichia coli K12/genética , Sítios de Ligação , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/classificação , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Internet , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/metabolismo , Modelos Genéticos , Anotação de Sequência Molecular , Fenótipo , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Biologia de Sistemas , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
The sets of compounds that can support growth of an organism are defined by the presence of transporters and metabolic pathways that convert nutrient sources into cellular components and energy for growth. A collection of known nutrient sources can therefore serve both as an impetus for investigating new metabolic pathways and transporters and as a reference for computational modeling of known metabolic pathways. To establish such a collection for Escherichia coli K-12, we have integrated data on the growth or nongrowth of E. coli K-12 obtained from published observations using a variety of individual media and from high-throughput phenotype microarrays into the EcoCyc database. The assembled collection revealed a substantial number of discrepancies between the high-throughput data sets, which we investigated where possible using low-throughput growth assays on soft agar and in liquid culture. We also integrated six data sets describing 16,119 observations of the growth of single-gene knockout mutants of E. coli K-12 into EcoCyc, which are relevant to antimicrobial drug design, provide clues regarding the roles of genes of unknown function, and are useful for validating metabolic models. To make this information easily accessible to EcoCyc users, we developed software for capturing, querying, and visualizing cellular growth assays and gene essentiality data.
Assuntos
Escherichia coli K12/crescimento & desenvolvimento , Regulação Bacteriana da Expressão Gênica/fisiologia , Antibacterianos/farmacologia , Bases de Dados Factuais , Desenho de Fármacos , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Análise em Microsséries , Mutação , Nitrogênio/metabolismo , SoftwareRESUMO
The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30,000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups.
Assuntos
Bases de Dados Factuais , Enzimas/metabolismo , Genômica , Redes e Vias Metabólicas , Metabolismo Energético , Genoma , Internet , Metabolômica , SoftwareRESUMO
Clinical genetic laboratories must have access to clinically validated biomedical data for precision medicine. A lack of accessibility, normalized structure, and consistency in evaluation complicates interpretation of disease causality, resulting in confusion in assessing the clinical validity of genes and genetic variants for diagnosis. A key goal of the Clinical Genome Resource (ClinGen) is to fill the knowledge gap concerning the strength of evidence supporting the role of a gene in a monogenic disease, which is achieved through a process known as Gene-Disease Validity curation. Here we review the work of ClinGen in developing a curation infrastructure that supports the standardization, harmonization, and dissemination of Gene-Disease Validity data through the creation of frameworks and the utilization of common data standards. This infrastructure is based on several applications, including the ClinGen GeneTracker, Gene Curation Interface, Data Exchange, GeneGraph, and website.
Assuntos
Bases de Dados Genéticas , Humanos , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/classificação , Medicina de Precisão/métodos , Predisposição Genética para DoençaRESUMO
BACKGROUND: As more complete genome sequences become available, bioinformatics challenges arise in how to exploit genome sequences to make phenotypic predictions. One type of phenotypic prediction is to determine sets of compounds that will support the growth of a bacterium from the metabolic network inferred from the genome sequence of that organism. RESULTS: We present a method for computationally determining alternative growth media for an organism based on its metabolic network and transporter complement. Our method predicted 787 alternative anaerobic minimal nutrient sets for Escherichia coli K-12 MG1655 from the EcoCyc database. The program automatically partitioned the nutrients within these sets into 21 equivalence classes, most of which correspond to compounds serving as sources of carbon, nitrogen, phosphorous, and sulfur, or combinations of these essential elements. The nutrient sets were predicted with 72.5% accuracy as evaluated by comparison with 91 growth experiments. Novel aspects of our approach include (a) exhaustive consideration of all combinations of nutrients rather than assuming that all element sources can substitute for one another(an assumption that can be invalid in general) (b) leveraging the notion of a machinery-duplicating constraint, namely, that all intermediate metabolites used in active reactions must be produced in increasing concentrations to prevent successive dilution from cell division, (c) the use of Satisfiability Modulo Theory solvers rather than Linear Programming solvers, because our approach cannot be formulated as linear programming, (d) the use of Binary Decision Diagrams to produce an efficient implementation. CONCLUSIONS: Our method for generating minimal nutrient sets from the metabolic network and transporters of an organism combines linear constraint solving with binary decision diagrams to efficiently produce solution sets to provided growth problems.
Assuntos
Algoritmos , Meios de Cultura , Redes e Vias Metabólicas , Biologia Computacional/métodos , Escherichia coli K12/genética , Escherichia coli K12/crescimento & desenvolvimento , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/metabolismo , Genômica , Proteínas de Membrana Transportadoras/metabolismo , Modelos BiológicosRESUMO
EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed.
Assuntos
Bases de Dados Genéticas , Escherichia coli K12/fisiologia , Sítios de Ligação , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Transdução de Sinais , Software , Fatores de Transcrição/metabolismo , Transcrição Gênica , Interface Usuário-ComputadorRESUMO
Objective: In 2021, the Clinical Genome Resource (ClinGen) amyotrophic lateral sclerosis (ALS) spectrum disorders Gene Curation Expert Panel (GCEP) was established to evaluate the strength of evidence for genes previously reported to be associated with ALS. Through this endeavor, we will provide standardized guidance to laboratories on which genes should be included in clinical genetic testing panels for ALS. In this manuscript, we aimed to assess the heterogeneity in the current global landscape of clinical genetic testing for ALS. Methods: We reviewed the National Institutes of Health (NIH) Genetic Testing Registry (GTR) and members of the ALS GCEP to source frequently used testing panels and compare the genes included on the tests. Results: 14 clinical panels specific to ALS from 14 laboratories covered 4 to 54 genes. All panels report on ANG, SOD1, TARDBP, and VAPB; 50% included or offered the option of including C9orf72 hexanucleotide repeat expansion (HRE) analysis. Of the 91 genes included in at least one of the panels, 40 (44.0%) were included on only a single panel. We could not find a direct link to ALS in the literature for 14 (15.4%) included genes. Conclusions: The variability across the surveyed clinical genetic panels is concerning due to the possibility of reduced diagnostic yields in clinical practice and risk of a missed diagnoses for patients. Our results highlight the necessity for consensus regarding the appropriateness of gene inclusions in clinical genetic ALS tests to improve its application for patients living with ALS and their families.
Assuntos
Esclerose Lateral Amiotrófica , Humanos , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/genética , Mutação , Testes Genéticos/métodos , Proteína C9orf72/genéticaRESUMO
Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.
Assuntos
Biologia Computacional , Genoma , Software , Biologia de Sistemas , InternetRESUMO
EcoCyc (http://EcoCyc.org) provides a comprehensive encyclopedia of Escherichia coli biology. EcoCyc integrates information about the genome, genes and gene products; the metabolic network; and the regulatory network of E. coli. Recent EcoCyc developments include a new initiative to represent and curate all types of E. coli regulatory processes such as attenuation and regulation by small RNAs. EcoCyc has started to curate Gene Ontology (GO) terms for E. coli and has made a dataset of E. coli GO terms available through the GO Web site. The curation and visualization of electron transfer processes has been significantly improved. Other software and Web site enhancements include the addition of tracks to the EcoCyc genome browser, in particular a type of track designed for the display of ChIP-chip datasets, and the development of a comparative genome browser. A new Genome Omics Viewer enables users to paint omics datasets onto the full E. coli genome for analysis. A new advanced query page guides users in interactively constructing complex database queries against EcoCyc. A Macintosh version of EcoCyc is now available. A series of Webinars is available to instruct users in the use of EcoCyc.
Assuntos
Bases de Dados Genéticas , Escherichia coli/genética , Escherichia coli/metabolismo , Membrana Celular/enzimologia , Transporte de Elétrons , Escherichia coli/enzimologia , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Genoma Bacteriano , Genômica , Internet , Software , Transcrição GênicaRESUMO
Updating genome databases to reflect newly published molecular findings for an organism was hard enough when only a single strain of a given organism had been sequenced. With multiple sequenced strains now available for many organisms, the challenge has grown significantly because of the still-limited resources available for the manual curation that corrects errors and captures new knowledge. We have developed a method to automatically propagate multiple types of curated knowledge from genes and proteins in one genome database to their orthologs in uncurated databases for related strains, imposing several quality-control filters to reduce the chances of introducing errors. We have applied this method to propagate information from the highly curated EcoCyc database for Escherichia coli K-12 to databases for 480 other Escherichia coli strains in the BioCyc database collection. The increase in value and utility of the target databases after propagation is considerable. Target databases received updates for an average of 2,535 proteins each. In addition to widespread addition and regularization of gene and protein names, 97% of the target databases were improved by the addition of at least 200 new protein complexes, at least 800 new or updated reaction assignments, and at least 2,400 sets of GO annotations.
RESUMO
The EcoCyc model-organism database collects and summarizes experimental data for Escherichia coli K-12. EcoCyc is regularly updated by the manual curation of individual database entries, such as genes, proteins, and metabolic pathways, and by the programmatic addition of results from select high-throughput analyses. Updates to the Pathway Tools software that supports EcoCyc and to the web interface that enables user access have continuously improved its usability and expanded its functionality. This article highlights recent improvements to the curated data in the areas of metabolism, transport, DNA repair, and regulation of gene expression. New and revised data analysis and visualization tools include an interactive metabolic network explorer, a circular genome viewer, and various improvements to the speed and usability of existing tools.
RESUMO
The extensive heterogeneity of biological data poses challenges to analysis and interpretation. Construction of a large-scale mechanistic model of Escherichia coli enabled us to integrate and cross-evaluate a massive, heterogeneous dataset based on measurements reported by various groups over decades. We identified inconsistencies with functional consequences across the data, including that the total output of the ribosomes and RNA polymerases described by data are not sufficient for a cell to reproduce measured doubling times, that measured metabolic parameters are neither fully compatible with each other nor with overall growth, and that essential proteins are absent during the cell cycle-and the cell is robust to this absence. Finally, considering these data as a whole leads to successful predictions of new experimental outcomes, in this case protein half-lives.