Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 11: 493, 2010 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-20920312

RESUMO

BACKGROUND: Genome context methods have been introduced in the last decade as automatic methods to predict functional relatedness between genes in a target genome using the patterns of existence and relative locations of the homologs of those genes in a set of reference genomes. Much work has been done in the application of these methods to different bioinformatics tasks, but few papers present a systematic study of the methods and their combination necessary for their optimal use. RESULTS: We present a thorough study of the four main families of genome context methods found in the literature: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms the gene neighbor method outperforms the phylogenetic profile method by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented. We propose the use of normalization procedures as those used on microarray data for the genome context scores. We show that substantial gains can be achieved from the use of a simple normalization technique. In particular, the sensitivity of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature. Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of up to 20% with respect to the gene neighbor method. Overall, this represents a gain of around 15% over what can be considered the state of the art in this area: the four original genome context methods combined using a procedure like that used in the STRING database. Unfortunately, we find that these gains disappear when the combiner is trained only with organisms that are phylogenetically distant from the target organism. CONCLUSIONS: Our experiments indicate that gene neighbor is the best individual genome context method and that gains from the combination of individual methods are very sensitive to the training data used to obtain the combiner's parameters. If adequate training data is not available, using the gene neighbor score by itself instead of a combined score might be the best choice.


Assuntos
Genômica/métodos , Calibragem , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Genoma , Genômica/normas , Família Multigênica , Filogenia
2.
BMC Bioinformatics ; 11: 15, 2010 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-20064214

RESUMO

BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. RESULTS: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. CONCLUSIONS: ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.


Assuntos
Inteligência Artificial , Redes e Vias Metabólicas , Biologia Computacional/métodos , Bases de Dados Factuais , Genoma , Software
3.
Nucleic Acids Res ; 38(Database issue): D473-9, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19850718

RESUMO

The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Genoma de Planta , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Biológicos , Estrutura Terciária de Proteína , Software
4.
Brief Bioinform ; 11(1): 40-79, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19955237

RESUMO

Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.


Assuntos
Biologia Computacional , Genoma , Software , Biologia de Sistemas , Internet
5.
Pac Symp Biocomput ; : 322-33, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15759638

RESUMO

The limitations of homology-based methods for prediction of protein molecular function are well known; differences in domain structure, gene duplication events and errors in existing database annotations complicate this process. In this paper we present a method to detect and model protein subfamilies, which can be used in high-throughput, genome-scale phylogenomic inference of protein function. We demonstrate the method on a set of nine PFAM families, and show that subfamily HMMs provide greater separation of homologs and non-homologs than is possible with a single HMM for each family. We also show that subfamily HMMs can be used for functional classification with a very low expected error rate. The BETE method for identifying functional subfamilies is illustrated on a set of serotonin receptors.


Assuntos
Genômica , Animais , Teorema de Bayes , Evolução Biológica , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Enzimas/genética , Duplicação Gênica , Cadeias de Markov , Modelos Genéticos , Filogenia , Proteínas/química , Proteínas/genética , Alinhamento de Sequência
6.
Proc Natl Acad Sci U S A ; 101(41): 14978-83, 2004 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-15466695

RESUMO

Auxin modulates diverse plant developmental pathways through direct transcriptional regulation and cooperative signaling with other plant hormones. Genetic and biochemical approaches have clarified several aspects of the auxin-regulated networks; however, the mechanisms of perception and subsequent signaling events remain largely uncharacterized. To elucidate unidentified intermediates, we have developed a high-throughput screen for identifying small molecule inhibitors of auxin signaling in Arabidopsis. Analysis of 10,000 compounds revealed several potent lead structures that abrogate transcription of an auxin-inducible reporter gene. Three compounds were found to interfere with auxin-regulated proteolysis of an auxin/indole-3-acetic acid transcription factor, and two impart phenotypes indicative of an altered auxin response, including impaired root development. Microarray analysis was used to demonstrate the mechanistic similarities of the two most potent molecules. This strategy promises to yield powerful tools for the discovery of unidentified components of the auxin-signaling networks and the study of auxin's participation in various stages of plant development.


Assuntos
Arabidopsis/genética , Regulação da Expressão Gênica de Plantas/genética , Ácidos Indolacéticos/genética , Ácido Abscísico/farmacologia , Arabidopsis/efeitos dos fármacos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Fenótipo
7.
Science ; 302(5646): 842-6, 2003 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-14593172

RESUMO

Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.


Assuntos
Arabidopsis/genética , Genoma de Planta , RNA Mensageiro/genética , RNA de Plantas/genética , Transcrição Gênica , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Clonagem Molecular , Biologia Computacional , DNA Complementar/genética , DNA Intergênico , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Genes de Plantas , Genômica , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Fases de Leitura Aberta , Reação em Cadeia da Polimerase Via Transcriptase Reversa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...