Pesquisa | Portal de Pesquisa da BVS Enfermagem

Linear: a framework to enable existing software to resolve structural variants in long reads with flexible and efficient alignment-free statistical models.

Pan, Chenxu; Rahn, René; Heller, David; Reinert, Knut.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36869850

RESUMO

Alignment is the cornerstone of many long-read pipelines and plays an essential role in resolving structural variants (SVs). However, forced alignments of SVs embedded in long reads, inflexibility of integrating novel SVs models and computational inefficiency remain problems. Here, we investigate the feasibility of resolving long-read SVs with alignment-free algorithms. We ask: (1) Is it possible to resolve long-read SVs with alignment-free approaches? and (2) Does it provide an advantage over existing approaches? To this end, we implemented the framework named Linear, which can flexibly integrate alignment-free algorithms such as the generative model for long-read SV detection. Furthermore, Linear addresses the problem of compatibility of alignment-free approaches with existing software. It takes as input long reads and outputs standardized results existing software can directly process. We conducted large-scale assessments in this work and the results show that the sensitivity, and flexibility of Linear outperform alignment-based pipelines. Moreover, the computational efficiency is orders of magnitude faster.

Assuntos

Genoma Humano , Software , Humanos , Algoritmos , Análise de Sequência , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Darvish, Mitra; Seiler, Enrico; Mehringer, Svenja; Rahn, René; Reinert, Knut.

Bioinformatics ; 38(17): 4100-4108, 2022 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-35801930

RESUMO

MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Análise de Sequência de DNA

Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut.

Bioinformatics ; 34(20): 3437-3445, 2018 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-29726911

RESUMO

Motivation: Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (single instruction multiple data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we (a) distribute many independent alignments on multiple threads and (b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. Results: We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon PhiTM (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon PhiTM and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. Availability and implementation: The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4 under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME: SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Alinhamento de Sequência , Software , Algoritmos

Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop.

Rahn, René; Weese, David; Reinert, Knut.

Bioinformatics ; 30(24): 3499-505, 2014 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-25028723

RESUMO

MOTIVATION: Next-generation sequencing (NGS) has revolutionized biomedical research in the past decade and led to a continuous stream of developments in bioinformatics, addressing the need for fast and space-efficient solutions for analyzing NGS data. Often researchers need to analyze a set of genomic sequences that stem from closely related species or are indeed individuals of the same species. Hence, the analyzed sequences are similar. For analyses where local changes in the examined sequence induce only local changes in the results, it is obviously desirable to examine identical or similar regions not repeatedly. RESULTS: In this work, we provide a datatype that exploits data parallelism inherent in a set of similar sequences by analyzing shared regions only once. In real-world experiments, we show that algorithms that otherwise would scan each reference sequentially can be speeded up by a factor of 115.

Assuntos

Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Computadores

Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.

Mehringer, Svenja; Seiler, Enrico; Droop, Felix; Darvish, Mitra; Rahn, René; Vingron, Martin; Reinert, Knut.

Genome Biol ; 24(1): 131, 2023 05 31.

Artigo em Inglês | MEDLINE | ID: mdl-37259161

RESUMO

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.

Assuntos

Algoritmos , Software

The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR.

Wibberg, Daniel; Batut, Bérénice; Belmann, Peter; Blom, Jochen; Glöckner, Frank Oliver; Grüning, Björn; Hoffmann, Nils; Kleinbölting, Nils; Rahn, René; Rey, Maja; Scholz, Uwe; Sharan, Malvika; Tauch, Andreas; Trojahn, Ulrike; Usadel, Björn; Kohlbacher, Oliver.

F1000Res ; 82019.

Artigo em Inglês | MEDLINE | ID: mdl-33163154

RESUMO

The German Network for Bioinformatics Infrastructure (de.NBI) is a national and academic infrastructure funded by the German Federal Ministry of Education and Research (BMBF). The de.NBI provides (i) service, (ii) training, and (iii) cloud computing to users in life sciences research and biomedicine in Germany and Europe and (iv) fosters the cooperation of the German bioinformatics community with international network structures. The de.NBI members also run the German node (ELIXIR-DE) within the European ELIXIR infrastructure. The de.NBI / ELIXIR-DE training platform, also known as special interest group 3 (SIG 3) 'Training & Education', coordinates the bioinformatics training of de.NBI and the German ELIXIR node. The network provides a high-quality, coherent, timely, and impactful training program across its eight service centers. Life scientists learn how to handle and analyze biological big data more effectively by applying tools, standards and compute services provided by de.NBI. Since 2015, more than 300 training courses were carried out with about 6,000 participants and these courses received recommendation rates of almost 90% (status as of July 2020). In addition to face-to-face training courses, online training was introduced on the de.NBI website in 2016 and guidelines for the preparation of e-learning material were established in 2018. In 2016, ELIXIR-DE joined the ELIXIR training platform. Here, the de.NBI / ELIXIR-DE training platform collaborates with ELIXIR in training activities, advertising training courses via TeSS and discussions on the exchange of data for training events essential for quality assessment on both the technical and administrative levels. The de.NBI training program trained thousands of scientists from Germany and beyond in many different areas of bioinformatics.

Assuntos

Biologia Computacional/educação , Europa (Continente) , Alemanha , Humanos

KNIME for reproducible cross-domain analysis of life science data.

Fillbrunn, Alexander; Dietz, Christian; Pfeuffer, Julianus; Rahn, René; Landrum, Gregory A; Berthold, Michael R.

J Biotechnol ; 261: 149-156, 2017 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-28757290

RESUMO

Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment progresses and are seldomly well documented, a fact that hinders the reproducibility of the experiment. Workflow systems such as KNIME Analytics Platform aim to solve these problems by providing a platform for connecting tools graphically and guaranteeing the same results on different operating systems. As an open source software, KNIME allows scientists and programmers to provide their own extensions to the scientific community. In this review paper we present selected extensions from the life sciences that simplify data exploration, analysis, and visualization and are interoperable due to KNIME's unified data model. Additionally, we name other workflow systems that are commonly used in the life sciences and highlight their similarities and differences to KNIME.

Assuntos

Biologia Computacional , Software , Disciplinas das Ciências Biológicas , Sequenciamento de Nucleotídeos em Larga Escala , Processamento de Imagem Assistida por Computador , Espectrometria de Massas

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.

Reinert, Knut; Dadi, Temesgen Hailemariam; Ehrhardt, Marcel; Hauswedell, Hannes; Mehringer, Svenja; Rahn, René; Kim, Jongkyu; Pockrandt, Christopher; Winkler, Jörg; Siragusa, Enrico; Urgese, Gianvito; Weese, David.

J Biotechnol ; 261: 157-168, 2017 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-28888961

RESUMO

BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008). RESULTS: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. CONCLUSIONS: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.

Assuntos

Bases de Dados Genéticas , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA