Your browser doesn't support javascript.
loading
Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.
Petter, Ella; Ding, Yi; Hou, Kangcheng; Bhattacharya, Arjun; Gusev, Alexander; Zaitlen, Noah; Pasaniuc, Bogdan.
Affiliation
  • Petter E; Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA. Electronic address: ellapetter@ucla.edu.
  • Ding Y; Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
  • Hou K; Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
  • Bhattacharya A; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
  • Gusev A; Dana-Farber Cancer Institute, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
  • Zaitlen N; Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Neurology, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
  • Pasaniuc B; Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen
Am J Hum Genet ; 110(8): 1319-1329, 2023 08 03.
Article in En | MEDLINE | ID: mdl-37490908
ABSTRACT
Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Polymorphism, Single Nucleotide / Genomics Type of study: Prognostic_studies Language: En Journal: Am J Hum Genet Year: 2023 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Polymorphism, Single Nucleotide / Genomics Type of study: Prognostic_studies Language: En Journal: Am J Hum Genet Year: 2023 Document type: Article