Your browser doesn't support javascript.
loading
Personalized Pangenome References.
Sirén, Jouni; Eskandar, Parsa; Ungaro, Matteo Tommaso; Hickey, Glenn; Eizenga, Jordan M; Novak, Adam M; Chang, Xian; Chang, Pi-Chuan; Kolmogorov, Mikhail; Carroll, Andrew; Monlong, Jean; Paten, Benedict.
Affiliation
  • Sirén J; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Eskandar P; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Ungaro MT; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Hickey G; University of Ferrara, Ferrara, via Fossato di Mortara 27, Ferrara, FE 44121, Italy.
  • Eizenga JM; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Novak AM; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Chang X; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Chang PC; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
  • Kolmogorov M; Google LLC, 1600 Amphitheater Pkwy, Mountain View, CA 94043, USA.
  • Carroll A; Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Monlong J; Google LLC, 1600 Amphitheater Pkwy, Mountain View, CA 94043, USA.
  • Paten B; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
bioRxiv ; 2023 Dec 15.
Article in En | MEDLINE | ID: mdl-38168361
ABSTRACT
Pangenomes, by including genetic diversity, should reduce reference bias by better representing new samples compared to them. Yet when comparing a new sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with using allele frequency filters. However, this is a blunt heuristic that both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach, inspired by local ancestry inference methods, that imputes a personalized pangenome subgraph based on sampling local haplotypes according to k-mer counts in the reads. Our approach is tailored for the Giraffe short read aligner, as the indexes it needs for read mapping can be built quickly. We compare the accuracy of our approach to state-of-the-art methods using graphs from the Human Pangenome Reference Consortium. The resulting personalized pangenome pipelines provide faster pangenome read mapping than comparable pipelines that use a linear reference, reduce small variant genotyping errors by 4x relative to the Genome Analysis Toolkit (GATK) best-practice pipeline, and for the first time make short-read structural variant genotyping competitive with long-read discovery methods.

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Guideline Language: En Journal: BioRxiv Year: 2023 Document type: Article Affiliation country: United States Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Guideline Language: En Journal: BioRxiv Year: 2023 Document type: Article Affiliation country: United States Country of publication: United States