Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Bioinformatics ; 35(18): 3279-3286, 2019 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-30689725

RESUMEN

SUMMARY: Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. AVAILABILITY AND IMPLEMENTATION: Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Solanum tuberosum , Algoritmos , Haplotipos , Programación Lineal , Análisis de Secuencia de ADN , Programas Informáticos
2.
Bioinformatics ; 34(17): i766-i772, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423080

RESUMEN

Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times. Results: To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation: https://gitlab.com/pirovc/dream_yara/.


Asunto(s)
Bases de Datos Factuales , Programas Informáticos , Humanos , Factores de Tiempo
4.
Nucleic Acids Res ; 41(7): e78, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23358824

RESUMEN

We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.


Asunto(s)
Mapeo Cromosómico/métodos , Programas Informáticos , Algoritmos , Animales , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Variación Genética , Genómica/métodos , Humanos
5.
iScience ; 23(4): 100988, 2020 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-32248063

RESUMEN

Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.

6.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1132-1142, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-28991752

RESUMEN

Computer simulations can be used to study population genetic methods, models, and parameters, as well as to predict potential outcomes. For example, in plant populations, predicting the outcome of breeding operations can be studied using simulations. In-silico construction of populations with pre-specified characteristics is an important task in breeding optimization and other population genetic studies. We present two linear time Simulation using Best-fit Algorithms (SimBA) for two classes of problems where each co-fits two distributions: SimBA-LD fits linkage disequilibrium and minimum allele frequency distributions, while SimBA-hap fits founder-haplotype and polyploid allele dosage distributions. An incremental gap-filling version of previously introduced SimBA-LD is here demonstrated to accurately fit the target distributions, allowing efficient large scale simulations. SimBA-hap accuracy and efficiency is demonstrated by simulating tetraploid populations with varying numbers of founder haplotypes, we evaluate both a linear time greedy algoritm and an optimal solution based on mixed-integer programming. SimBA is available on http://researcher.watson.ibm.com/project/5669.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Genómica , Alelos , Simulación por Computador , ADN de Plantas/genética , Dosificación de Gen , Frecuencia de los Genes , Genes de Plantas , Haplotipos , Humanos , Modelos Lineales , Desequilibrio de Ligamiento , Modelos Genéticos , Polimorfismo de Nucleótido Simple
7.
J Biotechnol ; 261: 157-168, 2017 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-28888961

RESUMEN

BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008). RESULTS: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. CONCLUSIONS: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de Secuencia
8.
Nat Protoc ; 10(12): 2004-15, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26562621

RESUMEN

Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.


Asunto(s)
Exoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Pruebas Genéticas/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
9.
Algorithms Mol Biol ; 2: 10, 2007 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-17877802

RESUMEN

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA