Pesquisa | Secretaria de Estado da Saúde

Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis.

Navarro, Javier; Nevado, Bruno; Hernández, Porfidio; Vera, Gonzalo; Ramos-Onsins, Sebastián E.

Evol Bioinform Online ; 13: 1176934317723884, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28894353

RESUMO

The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data.

Approaching Long Genomic Regions and Large Recombination Rates with msParSm as an Alternative to MaCS.

Montemuiño, Carlos; Espinosa, Antonio; Moure, Juan C; Vera, Gonzalo; Hernández, Porfidio; Ramos-Onsins, Sebastián.

Evol Bioinform Online ; 12: 223-228, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27721650

RESUMO

The msParSm application is an evolution of msPar, the parallel version of the coalescent simulation program ms, which removes the limitation for simulating long stretches of DNA sequences with large recombination rates, without compromising the accuracy of the standard coalescence. This work introduces msParSm, describes its significant performance improvements over msPar and its shared memory parallelization details, and shows how it can get better, if not similar, execution times than MaCS. Two case studies with different mutation rates were analyzed, one approximating the human average and the other approximating the Drosophila melanogaster average. Source code is available at https://github.com/cmontemuino/msparsm.

Efficient mapping of genomic sequences to optimize multiple pairwise alignment in hybrid cluster platforms.

Montañola, Alberto; Roig, Concepció; Hernández, Porfidio.

J Integr Bioinform ; 11(3): 251, 2014 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-25339085

RESUMO

Multiple sequence alignment (MSA), used in biocomputing to study similarities between different genomic sequences, is known to require important memory and computation resources. Nowadays, researchers are aligning thousands of these sequences, creating new challenges in order to solve the problem using the available resources efficiently. Determining the efficient amount of resources to allocate is important to avoid waste of them, thus reducing the economical costs required in running for example a specific cloud instance. The pairwise alignment is the initial key step of the MSA problem, which will compute all pair alignments needed. We present a method to determine the optimal amount of memory and computation resources to allocate by the pairwise alignment, and we will validate it through a set of experimental results for different possible inputs. These allow us to determine the best parameters to configure the applications in order to use effectively the available resources of a given system.

Assuntos

Algoritmos , Genoma , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Aminoácidos , Análise por Conglomerados , Simulação por Computador , Dados de Sequência Molecular , Fatores de Tempo

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa