Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 52, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38297220

RESUMO

BACKGROUND: Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism's metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. RESULTS: In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoost-based metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on single-organism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. CONCLUSIONS: The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.


Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas , Biologia , Genoma
2.
BMC Genomics ; 22(1): 191, 2021 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-33726670

RESUMO

BACKGROUND: Enrichment or over-representation analysis is a common method used in bioinformatics studies of transcriptomics, metabolomics, and microbiome datasets. The key idea behind enrichment analysis is: given a set of significantly expressed genes (or metabolites), use that set to infer a smaller set of perturbed biological pathways or processes, in which those genes (or metabolites) play a role. Enrichment computations rely on collections of defined biological pathways and/or processes, which are usually drawn from pathway databases. Although practitioners of enrichment analysis take great care to employ statistical corrections (e.g., for multiple testing), they appear unaware that enrichment results are quite sensitive to the pathway definitions that the calculation uses. RESULTS: We show that alternative pathway definitions can alter enrichment p-values by up to nine orders of magnitude, whereas statistical corrections typically alter enrichment p-values by only two orders of magnitude. We present multiple examples where the smaller pathway definitions used in the EcoCyc database produces stronger enrichment p-values than the much larger pathway definitions used in the KEGG database; we demonstrate that to attain a given enrichment p-value, KEGG-based enrichment analyses require 1.3-2.0 times as many significantly expressed genes as does EcoCyc-based enrichment analyses. The large pathways in KEGG are problematic for another reason: they blur together multiple (as many as 21) biological processes. When such a KEGG pathway receives a high enrichment p-value, which of its component processes is perturbed is unclear, and thus the biological conclusions drawn from enrichment of large pathways are also in question. CONCLUSIONS: The choice of pathway database used in enrichment analyses can have a much stronger effect on the enrichment results than the statistical corrections used in these analyses.


Assuntos
Biologia Computacional , Metabolômica , Bases de Dados Factuais
3.
Plant Cell Physiol ; 54(5): 673-85, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23493402

RESUMO

The medicinal plant Madagascar periwinkle (Catharanthus roseus) synthesizes numerous terpenoid indole alkaloids (TIAs), such as the anticancer drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online (http://www.cathacyc.org) and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants.


Assuntos
Catharanthus/metabolismo , Bases de Dados como Assunto , Redes e Vias Metabólicas , RNA de Plantas/metabolismo , Análise de Sequência de RNA , Catharanthus/genética , Análise por Conglomerados , Ciclopentanos/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Oxilipinas/metabolismo , RNA de Plantas/genética , Alcaloides de Triptamina e Secologanina/química , Alcaloides de Triptamina e Secologanina/metabolismo , Transcriptoma/genética
4.
J Comput Biol ; 28(11): 1075-1103, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34520674

RESUMO

Machine learning provides a probabilistic framework for metabolic pathway inference from genomic sequence information at different levels of complexity and completion. However, several challenges, including pathway features engineering, multiple mapping of enzymatic reactions, and emergent or distributed metabolism within populations or communities of cells, can limit prediction performance. In this article, we present triUMPF (triple non-negative matrix factorization [NMF] with community detection for metabolic pathway inference), which combines three stages of NMF to capture myriad relationships between enzymes and pathways within a graph network. This is followed by community detection to extract a higher-order structure based on the clustering of vertices that share similar statistical properties. We evaluated triUMPF performance by using experimental datasets manifesting diverse multi-label properties, including Tier 1 genomes from the BioCyc collection of organismal Pathway/Genome Databases and low complexity microbial communities. Resulting performance metrics equaled or exceeded other prediction methods on organismal genomes with improved precision on multi-organismal datasets.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Redes e Vias Metabólicas , Algoritmos , Proteínas de Bactérias/genética , Análise por Conglomerados , Aprendizado de Máquina , Microbiota
5.
Pathogens ; 9(9)2020 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-32932580

RESUMO

The class 1 carcinogen, Helicobacter pylori, is one of the World Health Organization's high priority pathogens for antimicrobial development. We used three subtractive proteomics approaches using protein pools retrieved from: chokepoint reactions in the BIOCYC database, the Kyoto Encyclopedia of Genes and Genomes, and the database of essential genes (DEG), to find putative drug targets and their inhibition by drug repurposing. The subtractive channels included non-homology to human proteome, essentiality analysis, sub-cellular localization prediction, conservation, lack of similarity to gut flora, druggability, and broad-spectrum activity. The minimum inhibitory concentration (MIC) of three selected ligands was determined to confirm anti-helicobacter activity. Seventeen protein targets were retrieved. They are involved in motility, cell wall biosynthesis, processing of environmental and genetic information, and synthesis and metabolism of secondary metabolites, amino acids, vitamins, and cofactors. The DEG protein pool approach was superior, as it retrieved all drug targets identified by the other two approaches. Binding ligands (n = 42) were mostly small non-antibiotic compounds. Citric, dipicolinic, and pyrophosphoric acid inhibited H. pylori at an MIC of 1.5-2.5 mg/mL. In conclusion, we identified potential drug targets in H. pylori, and repurposed their binding ligands as possible anti-helicobacter agents, saving time and effort required for the development of new antimicrobial compounds.

6.
Metabolites ; 9(5)2019 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-31052521

RESUMO

Interpreting changes in metabolite abundance in response to experimental treatments or disease states remains a major challenge in metabolomics. Pathway Covering is a new algorithm that takes a list of metabolites (compounds) and determines a minimum-cost set of metabolic pathways in an organism that includes (covers) all the metabolites in the list. We used five functions for assigning costs to pathways, including assigning a constant for all pathways, which yields a solution with the smallest pathway count; two methods that penalize large pathways; one that prefers pathways based on the pathway's assigned function, and one that loosely corresponds to metabolic flux. The pathway covering set computed by the algorithm can be displayed as a multi-pathway diagram ("pathway collage") that highlights the covered metabolites. We investigated the pathway covering algorithm by using several datasets from the Metabolomics Workbench. The algorithm is best applied to a list of metabolites with significant statistics and fold-changes with a specified direction of change for each metabolite. The pathway covering algorithm is now available within the Pathway Tools software and BioCyc website.

7.
Microbiome ; 7(1): 89, 2019 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-31174602

RESUMO

BACKGROUND: Microbiomes are complex aggregates of organisms, each of which has its own extensive metabolic network. A variety of metabolites are exchanged between the microbes. The challenge we address is understanding the overall metabolic capabilities of a microbiome: through what series of metabolic transformations can a microbiome convert a starting compound to an ending compound? RESULTS: We developed an efficient software tool to search for metabolic routes that include metabolic reactions from multiple organisms. The metabolic network for each organism is obtained from BioCyc, where the network was inferred from the annotated genome. The tool searches for optimal metabolic routes that minimize the number of reactions in each route, maximize the number of atoms conserved between the starting and ending compounds, and minimize the number of organism switches. The tool pre-computes the reaction sets found in each organism from BioCyc to facilitate fast computation of the reactions defined in a researcher-specified organism set. The generated routes are depicted graphically, and for each reaction in a route, the tool lists the organisms that can catalyze that reaction. We present solutions for three route-finding problems in the human gut microbiome: (1) production of indoxyl sulfate, (2) production of trimethylamine N-oxide (TMAO), and (3) synthesis and degradation of autoinducers. The optimal routes computed by our multi-organism route-search (MORS) tool for indoxyl sulfate and TMAO were the same as routes reported in the literature. CONCLUSIONS: Our tool quickly found plausible routes for the discussed multi-organism route-finding problems. The routes shed light on how diverse organisms cooperate to perform multi-step metabolic transformations. Our tool enables scientists to consider multiple alternative routes and identifies the organisms responsible for each reaction.


Assuntos
Biologia Computacional/métodos , Microbioma Gastrointestinal , Redes e Vias Metabólicas , Software , Bases de Dados Genéticas , Humanos , Indicã/biossíntese , Metagenoma , Metilaminas/metabolismo
8.
Methods Mol Biol ; 1533: 241-256, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27987175

RESUMO

The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta , Genômica , Plantas/genética , Plantas/metabolismo , Software , Redes Reguladoras de Genes , Genômica/métodos , Redes e Vias Metabólicas , Ferramenta de Busca , Transdução de Sinais , Especificidade da Espécie , Navegador
9.
BMC Syst Biol ; 10(1): 129, 2016 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-27899149

RESUMO

BACKGROUND: As metabolic pathway resources become more commonly available, researchers have unprecedented access to information about their organism of interest. Despite efforts to ensure consistency between various resources, information content and quality can vary widely. Two maize metabolic pathway resources for the B73 inbred line, CornCyc 4.0 and MaizeCyc 2.2, are based on the same gene model set and were developed using Pathway Tools software. These resources differ in their initial enzymatic function assignments and in the extent of manual curation. We present an in-depth comparison between CornCyc and MaizeCyc to demonstrate the effect of initial computational enzymatic function assignments on the quality and content of metabolic pathway resources. RESULTS: These two resources are different in their content. MaizeCyc contains GO annotations for over 21,000 genes that CornCyc is missing. CornCyc contains on average 1.6 transcripts per gene, while MaizeCyc contains almost no alternate splicing. MaizeCyc also does not match CornCyc's breadth in representing the metabolic domain; MaizeCyc has fewer compounds, reactions, and pathways than CornCyc. CornCyc's computational predictions are more accurate than those in MaizeCyc when compared to experimentally determined function assignments, demonstrating the relative strength of the enzymatic function assignment pipeline used to generate CornCyc. CONCLUSIONS: Our results show that the quality of initial enzymatic function assignments primarily determines the quality of the final metabolic pathway resource. Therefore, biologists should pay close attention to the methods and information sources used to develop a metabolic pathway resource to gauge the utility of using such functional assignments to construct hypotheses for experimental studies.


Assuntos
Biologia Computacional , Zea mays/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/metabolismo , Zea mays/enzimologia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa