A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana, Kalliopi; Forslund, Kristoffer; Larsson, Tomas; Powell, Sean; Doerks, Tobias; von Mering, Christian; Bork, Peer

Trachana, Kalliopi; Forslund, Kristoffer; Larsson, Tomas; Powell, Sean; Doerks, Tobias; von Mering, Christian; Bork, Peer.

Affiliation

Trachana K; Institute for Systems Biology, Seattle, WA, United States of America.
Forslund K; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Larsson T; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Powell S; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Doerks T; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
von Mering C; Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland.
Bork P; Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany.

PLoS One ; 9(11): e111122, 2014.

Article in En | MEDLINE | ID: mdl-25369365

ABSTRACT

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Subject(s)

Computational Biology; Phylogeny; Bacteria/classification; Computational Biology/standards; Genomics; Internet; User-Computer Interface

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Phylogeny / Computational Biology Type of study: Prognostic_studies Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2014 Document type: Article Affiliation country: United States Country of publication: United States

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google