Pesquisa | Secretaria de Estado da Saúde

Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.

Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros.

Syst Biol ; 62(1): 162-6, 2013 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-22962004

RESUMO

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.

Assuntos

Algoritmos , Classificação/métodos , Internet , Filogenia , Software , Simulação por Computador , Reprodutibilidade dos Testes

Homology-based inference sets the bar high for protein function prediction.

Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Rost, Burkhard.

BMC Bioinformatics ; 14 Suppl 3: S7, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23514582

RESUMO

BACKGROUND: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. METHODS: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. RESULTS AND CONCLUSIONS: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.

Assuntos

Proteínas/fisiologia , Homologia de Sequência de Aminoácidos , Algoritmos , Proteínas/genética

Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.

Berger, Simon A; Krompass, Denis; Stamatakis, Alexandros.

Syst Biol ; 60(3): 291-302, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21436105

RESUMO

We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.

Assuntos

Algoritmos , Evolução Molecular , Funções Verossimilhança , Filogenia , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Sequência de Bases , Simulação por Computador , Internet , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa