Pesquisa | Portal de Pesquisa da BVS

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Darvish, Mitra; Seiler, Enrico; Mehringer, Svenja; Rahn, René; Reinert, Knut.

Bioinformatics ; 38(17): 4100-4108, 2022 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-35801930

RESUMO

MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Análise de Sequência de DNA

Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.

Mehringer, Svenja; Seiler, Enrico; Droop, Felix; Darvish, Mitra; Rahn, René; Vingron, Martin; Reinert, Knut.

Genome Biol ; 24(1): 131, 2023 05 31.

Artigo em Inglês | MEDLINE | ID: mdl-37259161

RESUMO

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.

Assuntos

Algoritmos , Software

Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.

Seiler, Enrico; Mehringer, Svenja; Darvish, Mitra; Turc, Etienne; Reinert, Knut.

iScience ; 24(7): 102782, 2021 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-34337360

RESUMO

We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.

Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits.

Beyter, Doruk; Ingimundardottir, Helga; Oddsson, Asmundur; Eggertsson, Hannes P; Bjornsson, Eythor; Jonsson, Hakon; Atlason, Bjarni A; Kristmundsdottir, Snaedis; Mehringer, Svenja; Hardarson, Marteinn T; Gudjonsson, Sigurjon A; Magnusdottir, Droplaug N; Jonasdottir, Aslaug; Jonasdottir, Adalbjorg; Kristjansson, Ragnar P; Sverrisson, Sverrir T; Holley, Guillaume; Palsson, Gunnar; Stefansson, Olafur A; Eyjolfsson, Gudmundur; Olafsson, Isleifur; Sigurdardottir, Olof; Torfason, Bjarni; Masson, Gisli; Helgason, Agnar; Thorsteinsdottir, Unnur; Holm, Hilma; Gudbjartsson, Daniel F; Sulem, Patrick; Magnusson, Olafur T; Halldorsson, Bjarni V; Stefansson, Kari.

Nat Genet ; 53(6): 779-786, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-33972781

RESUMO

Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.

Assuntos

Doença/genética , Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Característica Quantitativa Herdável , Alelos , LDL-Colesterol/metabolismo , Cromossomos Humanos/genética , Feminino , Frequência do Gene/genética , Humanos , Islândia , Modelos Lineares , Masculino , Pró-Proteína Convertase 9/genética , Recombinação Genética/genética , Deleção de Sequência/genética

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.

Reinert, Knut; Dadi, Temesgen Hailemariam; Ehrhardt, Marcel; Hauswedell, Hannes; Mehringer, Svenja; Rahn, René; Kim, Jongkyu; Pockrandt, Christopher; Winkler, Jörg; Siragusa, Enrico; Urgese, Gianvito; Weese, David.

J Biotechnol ; 261: 157-168, 2017 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-28888961

RESUMO

BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008). RESULTS: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. CONCLUSIONS: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.

Assuntos

Bases de Dados Genéticas , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA