Pesquisa | BVS - MINISTÉRIO DA SAÚDE

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Jiang, Yuxiang; Oron, Tal Ronnen; Clark, Wyatt T; Bankapur, Asma R; D'Andrea, Daniel; Lepore, Rosalba; Funk, Christopher S; Kahanda, Indika; Verspoor, Karin M; Ben-Hur, Asa; Koo, Da Chen Emily; Penfold-Brown, Duncan; Shasha, Dennis; Youngs, Noah; Bonneau, Richard; Lin, Alexandra; Sahraeian, Sayed M E; Martelli, Pier Luigi; Profiti, Giuseppe; Casadio, Rita; Cao, Renzhi; Zhong, Zhaolong; Cheng, Jianlin; Altenhoff, Adrian; Skunca, Nives; Dessimoz, Christophe; Dogan, Tunca; Hakala, Kai; Kaewphan, Suwisa; Mehryary, Farrokh; Salakoski, Tapio; Ginter, Filip; Fang, Hai; Smithers, Ben; Oates, Matt; Gough, Julian; Törönen, Petri; Koskinen, Patrik; Holm, Liisa; Chen, Ching-Tai; Hsu, Wen-Lian; Bryson, Kevin; Cozzetto, Domenico; Minneci, Federico; Jones, David T; Chapman, Samuel; Bkc, Dukka; Khan, Ishita K; Kihara, Daisuke; Ofer, Dan.

Genome Biol ; 17(1): 184, 2016 09 07.

Artigo em Inglês | MEDLINE | ID: mdl-27604469

RESUMO

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Assuntos

Biologia Computacional , Proteínas/química , Software , Relação Estrutura-Atividade , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Proteínas/genética

Negative example selection for protein function prediction: the NoGO database.

Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis.

PLoS Comput Biol ; 10(6): e1003644, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24922051

RESUMO

Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

Assuntos

Algoritmos , Bases de Dados Genéticas , Ontologia Genética , Proteínas/genética , Proteínas/fisiologia , Animais , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/fisiologia , Inteligência Artificial , Biologia Computacional , Genoma , Humanos , Camundongos , Anotação de Sequência Molecular , Proteoma , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia

Parametric Bayesian priors and better choice of negative examples improve protein function prediction.

Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard.

Bioinformatics ; 29(9): 1190-8, 2013 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-23511543

RESUMO

MOTIVATION: Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. RESULTS: We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. AVAILABILITY: Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html

Assuntos

Algoritmos , Proteínas/fisiologia , Animais , Inteligência Artificial , Teorema de Bayes , Redes Reguladoras de Genes , Genoma , Camundongos , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Proteínas/genética , Proteínas/metabolismo , Proteoma/metabolismo , Leveduras/genética , Leveduras/metabolismo

The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts.

Baltz, Alexander G; Munschauer, Mathias; Schwanhäusser, Björn; Vasile, Alexandra; Murakawa, Yasuhiro; Schueler, Markus; Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Milek, Miha; Wyler, Emanuel; Bonneau, Richard; Selbach, Matthias; Dieterich, Christoph; Landthaler, Markus.

Mol Cell ; 46(5): 674-90, 2012 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-22681889

RESUMO

Protein-RNA interactions are fundamental to core biological processes, such as mRNA splicing, localization, degradation, and translation. We developed a photoreactive nucleotide-enhanced UV crosslinking and oligo(dT) purification approach to identify the mRNA-bound proteome using quantitative proteomics and to display the protein occupancy on mRNA transcripts by next-generation sequencing. Application to a human embryonic kidney cell line identified close to 800 proteins. To our knowledge, nearly one-third were not previously annotated as RNA binding, and about 15% were not predictable by computational methods to interact with RNA. Protein occupancy profiling provides a transcriptome-wide catalog of potential cis-regulatory regions on mammalian mRNAs and showed that large stretches in 3' UTRs can be contacted by the mRNA-bound proteome, with numerous putative binding sites in regions harboring disease-associated nucleotide polymorphisms. Our observations indicate the presence of a large number of mRNA binders with diverse molecular functions participating in combinatorial posttranscriptional gene-expression networks.

Assuntos

Proteômica/métodos , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/metabolismo , Sítios de Ligação , Linhagem Celular , Humanos , Espectrometria de Massas , Proteínas de Ligação a RNA/química , Análise de Sequência de RNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA