Búsqueda | Portal Regional de la BVS

Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.

Mehringer, Svenja; Seiler, Enrico; Droop, Felix; Darvish, Mitra; Rahn, René; Vingron, Martin; Reinert, Knut.

Genome Biol ; 24(1): 131, 2023 05 31.

Artículo en Inglés | MEDLINE | ID: mdl-37259161

RESUMEN

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.

Asunto(s)

Algoritmos , Programas Informáticos

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Darvish, Mitra; Seiler, Enrico; Mehringer, Svenja; Rahn, René; Reinert, Knut.

Bioinformatics ; 38(17): 4100-4108, 2022 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-35801930

RESUMEN

MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Programas Informáticos , Análisis de Secuencia de ADN

Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.

Seiler, Enrico; Mehringer, Svenja; Darvish, Mitra; Turc, Etienne; Reinert, Knut.

iScience ; 24(7): 102782, 2021 Jul 23.

Artículo en Inglés | MEDLINE | ID: mdl-34337360

RESUMEN

We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.

Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits.

Beyter, Doruk; Ingimundardottir, Helga; Oddsson, Asmundur; Eggertsson, Hannes P; Bjornsson, Eythor; Jonsson, Hakon; Atlason, Bjarni A; Kristmundsdottir, Snaedis; Mehringer, Svenja; Hardarson, Marteinn T; Gudjonsson, Sigurjon A; Magnusdottir, Droplaug N; Jonasdottir, Aslaug; Jonasdottir, Adalbjorg; Kristjansson, Ragnar P; Sverrisson, Sverrir T; Holley, Guillaume; Palsson, Gunnar; Stefansson, Olafur A; Eyjolfsson, Gudmundur; Olafsson, Isleifur; Sigurdardottir, Olof; Torfason, Bjarni; Masson, Gisli; Helgason, Agnar; Thorsteinsdottir, Unnur; Holm, Hilma; Gudbjartsson, Daniel F; Sulem, Patrick; Magnusson, Olafur T; Halldorsson, Bjarni V; Stefansson, Kari.

Nat Genet ; 53(6): 779-786, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-33972781

RESUMEN

Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.

Asunto(s)

Enfermedad/genética , Variación Estructural del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Carácter Cuantitativo Heredable , Alelos , LDL-Colesterol/metabolismo , Cromosomas Humanos/genética , Femenino , Frecuencia de los Genes/genética , Humanos , Islandia , Modelos Lineales , Masculino , Proproteína Convertasa 9/genética , Recombinación Genética/genética , Eliminación de Secuencia/genética

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.

Reinert, Knut; Dadi, Temesgen Hailemariam; Ehrhardt, Marcel; Hauswedell, Hannes; Mehringer, Svenja; Rahn, René; Kim, Jongkyu; Pockrandt, Christopher; Winkler, Jörg; Siragusa, Enrico; Urgese, Gianvito; Weese, David.

J Biotechnol ; 261: 157-168, 2017 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-28888961

RESUMEN

BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008). RESULTS: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. CONCLUSIONS: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.

Asunto(s)

Bases de Datos Genéticas , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de Secuencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA