HapCol: accurate and memory-efficient haplotype assembly from long reads.

Pirola, Yuri; Zaccaria, Simone; Dondi, Riccardo; Klau, Gunnar W; Pisanti, Nadia; Bonizzoni, Paola

Pirola, Yuri; Zaccaria, Simone; Dondi, Riccardo; Klau, Gunnar W; Pisanti, Nadia; Bonizzoni, Paola.

Afiliação

Pirola Y; Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy.
Zaccaria S; Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy.
Dondi R; Dipartimento di Scienze Umane e Sociali, Univ. degli Studi di Bergamo, Bergamo, Italy.
Klau GW; Life Sciences group, Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands, ERABLE Team, INRIA, Lyon, France and.
Pisanti N; ERABLE Team, INRIA, Lyon, France and Dipartimento di Informatica, Univ. degli Studi di Pisa, Pisa, Italy.
Bonizzoni P; Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Univ. degli Studi di Milano-Bicocca, Milan, Italy.

Bioinformatics ; 32(11): 1610-7, 2016 06 01.

Article em En | MEDLINE | ID: mdl-26315913

RESUMO

MOTIVATION: Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of 'future-generation' sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions. RESULTS: By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. AVAILABILITY AND IMPLEMENTATION: Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/ CONTACT: bonizzoni@disco.unimib.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Haplótipos; Algoritmos; Diploide; Polimorfismo de Nucleotídeo Único; Análise de Sequência de DNA; Software

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Haplótipos Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Haplótipos Idioma: En Ano de publicação: 2016 Tipo de documento: Article