Pesquisa | Biblioteca Virtual em Saúde

IAMBEE: a web-service for the identification of adaptive pathways from parallel evolved clonal populations.

Perez-Romero, Camilo Andres; Weytjens, Bram; Decap, Dries; Swings, Toon; Michiels, Jan; De Maeyer, Dries; Marchal, Kathleen.

Nucleic Acids Res ; 47(W1): W151-W157, 2019 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-31127271

RESUMO

IAMBEE is a web server designed for the Identification of Adaptive Mutations in Bacterial Evolution Experiments (IAMBEE). Input data consist of genotype information obtained from independently evolved clonal populations or strains that show the same adapted behavior (phenotype). To distinguish adaptive from passenger mutations, IAMBEE searches for neighborhoods in an organism-specific interaction network that are recurrently mutated in the adapted populations. This search for recurrently mutated network neighborhoods, as proxies for pathways is driven by additional information on the functional impact of the observed genetic changes and their dynamics during adaptive evolution. In addition, the search explicitly accounts for the differences in mutation rate between the independently evolved populations. Using this approach, IAMBEE allows exploiting parallel evolution to identify adaptive pathways. The web-server is freely available at http://bioinformatics.intec.ugent.be/iambee/ with no login requirement.

Assuntos

Adaptação Biológica/genética , Bactérias/genética , Evolução Clonal/genética , Bases de Dados Genéticas , Software , Genótipo , Mutação/genética , Taxa de Mutação , Fenótipo , Navegador

Halvade: scalable sequence analysis with MapReduce.

Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan.

Bioinformatics ; 31(15): 2482-8, 2015 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-25819078

RESUMO

MOTIVATION: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. RESULTS: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.

Assuntos

Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos

BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan.

Bioinformatics ; 31(23): 3758-66, 2015 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-26254488

RESUMO

MOTIVATION: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. RESULTS: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. AVAILABILITY AND IMPLEMENTATION: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller CONTACT: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Genoma de Planta , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Sequência de Bases , Sítios de Ligação , Sequência Conservada , DNA de Plantas/química , Motivos de Nucleotídeos , Alinhamento de Sequência , Software , Fatores de Transcrição/metabolismo

BLSSpeller to discover novel regulatory motifs in maize.

Rahmani, Razgar Seyed; Decap, Dries; Fostier, Jan; Marchal, Kathleen.

DNA Res ; 29(4)2022 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-35904558

RESUMO

With the decreasing cost of sequencing and availability of larger numbers of sequenced genomes, comparative genomics is becoming increasingly attractive to complement experimental techniques for the task of transcription factor (TF) binding site identification. In this study, we redesigned BLSSpeller, a motif discovery algorithm, to cope with larger sequence datasets. BLSSpeller was used to identify novel motifs in Zea mays in a comparative genomics setting with 16 monocot lineages. We discovered 61 motifs of which 20 matched previously described motif models in Arabidopsis. In addition, novel, yet uncharacterized motifs were detected, several of which are supported by available sequence-based and/or functional data. Instances of the predicted motifs were enriched around transcription start sites and contained signatures of selection. Moreover, the enrichment of the predicted motif instances in open chromatin and TF binding sites indicates their functionality, supported by the fact that genes carrying instances of these motifs were often found to be co-expressed and/or enriched in similar GO functions. Overall, our study unveiled several novel candidate motifs that might help our understanding of the genotype to phenotype association in crops.

Assuntos

Arabidopsis , Zea mays , Algoritmos , Arabidopsis/genética , Sítios de Ligação , Genômica/métodos , Motivos de Nucleotídeos , Ligação Proteica , Zea mays/genética

Halvade somatic: Somatic variant calling with Apache Spark.

Decap, Dries; de Schaetzen van Brienen, Louise; Larmuseau, Maarten; Costanza, Pascal; Herzeel, Charlotte; Wuyts, Roel; Marchal, Kathleen; Fostier, Jan.

Gigascience ; 11(1)2022 01 12.

Artigo em Inglês | MEDLINE | ID: mdl-35022699

RESUMO

BACKGROUND: The accurate detection of somatic variants from sequencing data is of key importance for cancer treatment and research. Somatic variant calling requires a high sequencing depth of the tumor sample, especially when the detection of low-frequency variants is also desired. In turn, this leads to large volumes of raw sequencing data to process and hence, large computational requirements. For example, calling the somatic variants according to the GATK best practices guidelines requires days of computing time for a typical whole-genome sequencing sample. FINDINGS: We introduce Halvade Somatic, a framework for somatic variant calling from DNA sequencing data that takes advantage of multi-node and/or multi-core compute platforms to reduce runtime. It relies on Apache Spark to provide scalable I/O and to create and manage data streams that are processed on different CPU cores in parallel. Halvade Somatic contains all required steps to process the tumor and matched normal sample according to the GATK best practices recommendations: read alignment (BWA), sorting of reads, preprocessing steps such as marking duplicate reads and base quality score recalibration (GATK), and, finally, calling the somatic variants (Mutect2). Our approach reduces the runtime on a single 36-core node to 19.5 h compared to a runtime of 84.5 h for the original pipeline, a speedup of 4.3 times. Runtime can be further decreased by scaling to multiple nodes, e.g., we observe a runtime of 1.36 h using 16 nodes, an additional speedup of 14.4 times. Halvade Somatic supports variant calling from both whole-genome sequencing and whole-exome sequencing data and also supports Strelka2 as an alternative or complementary variant calling tool. We provide a Docker image to facilitate single-node deployment. Halvade Somatic can be executed on a variety of compute platforms, including Amazon EC2 and Google Cloud. CONCLUSIONS: To our knowledge, Halvade Somatic is the first somatic variant calling pipeline that leverages Big Data processing platforms and provides reliable, scalable performance. Source code is freely available.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Sequenciamento do Exoma , Sequenciamento Completo do Genoma

Multithreaded variant calling in elPrep 5.

Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Wuyts, Roel; Verachtert, Wilfried.

PLoS One ; 16(2): e0244471, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33539352

RESUMO

We present elPrep 5, which updates the elPrep framework for processing sequencing alignment/map files with variant calling. elPrep 5 can now execute the full pipeline described by the GATK Best Practices for variant calling, which consists of PCR and optical duplicate marking, sorting by coordinate order, base quality score recalibration, and variant calling using the haplotype caller algorithm. elPrep 5 produces identical BAM and VCF output as GATK4 while significantly reducing the runtime by parallelizing and merging the execution of the pipeline steps. Our benchmarks show that elPrep 5 speeds up the runtime of the variant calling pipeline by a factor 8-16x on both whole-exome and whole-genome data while using the same hardware resources as GATK4. This makes elPrep 5 a suitable drop-in replacement for GATK4 when faster execution times are needed.

Assuntos

Exoma , Genoma Humano , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Sequenciamento do Exoma

elPrep 4: A multithreaded framework for sequence analysis.

Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Verachtert, Wilfried.

PLoS One ; 14(2): e0209523, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30759172

RESUMO

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.

Assuntos

Análise de Sequência/métodos , Algoritmos , Biologia Computacional/economia , Biologia Computacional/métodos , Custos e Análise de Custo , Exoma , Análise de Sequência/economia , Software

Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan.

PLoS One ; 12(3): e0174575, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28358893

RESUMO

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires â¼28h, Halvade-RNA reduces this runtime to â¼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética , Software , Transcriptoma/genética , Algoritmos , Biologia Computacional , Genômica , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA

elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke.

PLoS One ; 10(7): e0132868, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26182406

RESUMO

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

Assuntos

Algoritmos , Exoma , Genoma Humano , Alinhamento de Sequência/economia , Software , Benchmarking , Mapeamento de Sequências Contíguas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA