Your browser doesn't support javascript.
loading
Benchmarking the next generation of homology inference tools.
Saripella, Ganapathi Varma; Sonnhammer, Erik L L; Forslund, Kristoffer.
Afiliación
  • Saripella GV; Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden.
  • Sonnhammer EL; Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden.
  • Forslund K; European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg 69117, Germany.
Bioinformatics ; 32(17): 2636-41, 2016 09 01.
Article en En | MEDLINE | ID: mdl-27256311
ABSTRACT
MOTIVATION Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the 'next generation' of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA.

METHOD:

We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases.

RESULTS:

CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization.

CONCLUSION:

Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. AVAILABILITY AND IMPLEMENTATION Benchmark datasets and all scripts are placed at (http//sonnhammer.org/download/Homology_benchmark). CONTACT forslund@embl.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Homología de Secuencia / Benchmarking / Bases de Datos de Proteínas Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Suecia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Homología de Secuencia / Benchmarking / Bases de Datos de Proteínas Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Suecia
...