Your browser doesn't support javascript.
loading
Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.
Pandey, Prashant; Almodaresi, Fatemeh; Bender, Michael A; Ferdman, Michael; Johnson, Rob; Patro, Rob.
Afiliación
  • Pandey P; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA.
  • Almodaresi F; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA.
  • Bender MA; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA.
  • Ferdman M; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA.
  • Johnson R; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA; VMware Research, 3425 Hillview Ave, Palo Alto, CA 94304, USA.
  • Patro R; Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA. Electronic address: rob.patro@cs.stonybrook.edu.
Cell Syst ; 7(2): 201-207.e4, 2018 08 22.
Article en En | MEDLINE | ID: mdl-29936185
ABSTRACT
Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6-108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / ARN / Análisis de Secuencia de ARN Límite: Animals / Humans Idioma: En Revista: Cell Syst Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / ARN / Análisis de Secuencia de ARN Límite: Animals / Humans Idioma: En Revista: Cell Syst Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos
...