Your browser doesn't support javascript.
loading
Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.
Petrucci, Enrico; Noé, Laurent; Pizzi, Cinzia; Comin, Matteo.
Afiliación
  • Petrucci E; Department of Information Engineering, University of Padova, Padova, Italy.
  • Noé L; CRIStAL UMR9189, Universit de Lille, Lille, France.
  • Pizzi C; Department of Information Engineering, University of Padova, Padova, Italy.
  • Comin M; Department of Information Engineering, University of Padova, Padova, Italy.
J Comput Biol ; 27(2): 223-233, 2020 Feb.
Article en En | MEDLINE | ID: mdl-31800307
ABSTRACT
Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed up the indexing of k-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are k-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of k-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this article we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup in range of [3.5 × -7 × ], outperforming previous methods. Software and data sets are available at Iterative Spaced Seed Hashing.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2020 Tipo del documento: Article País de afiliación: Italia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2020 Tipo del documento: Article País de afiliación: Italia