Your browser doesn't support javascript.
loading
Comparison of phasing strategies for whole human genomes.
Choi, Yongwook; Chan, Agnes P; Kirkness, Ewen; Telenti, Amalio; Schork, Nicholas J.
Afiliación
  • Choi Y; J. Craig Venter Institute, Rockville, Maryland, United States of America.
  • Chan AP; J. Craig Venter Institute, Rockville, Maryland, United States of America.
  • Kirkness E; Human Longevity, Inc., San Diego, California, United States of America.
  • Telenti A; J. Craig Venter Institute, La Jolla, California, United States of America.
  • Schork NJ; J. Craig Venter Institute, La Jolla, California, United States of America.
PLoS Genet ; 14(4): e1007308, 2018 04.
Article en En | MEDLINE | ID: mdl-29621242
ABSTRACT
Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not 'phase' the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available 'Genome-In-A-Bottle' (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Genoma Humano Límite: Female / Humans / Male Idioma: En Revista: PLoS Genet Asunto de la revista: GENETICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Genoma Humano Límite: Female / Humans / Male Idioma: En Revista: PLoS Genet Asunto de la revista: GENETICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos