Your browser doesn't support javascript.
loading
Fast and robust ancestry prediction using principal component analysis.
Zhang, Daiwei; Dey, Rounak; Lee, Seunggeun.
Afiliação
  • Zhang D; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
  • Dey R; Department of Biostatistics, Harvard University, Boston, MA 02115, USA.
  • Lee S; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics ; 36(11): 3439-3446, 2020 06 01.
Article em En | MEDLINE | ID: mdl-32196066
ABSTRACT
MOTIVATION Population stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false-positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loadings and the recently developed data augmentation, decomposition and Procrustes (ADP) transformation, such as LASER and TRACE, are popular methods for predicting PC scores. However, the predicted PC scores from SP can be biased toward NULL. On the other hand, ADP has a high computation cost because it requires running PCA separately for each study sample on the augmented dataset.

RESULTS:

We develop and propose two alternative approaches bias-adjusted projection (AP) and online ADP (OADP). Using random matrix theory, AP asymptotically estimates and adjusts for the bias of SP. OADP uses a computationally efficient online singular value decomposition algorithm, which can greatly reduce the computation cost of ADP. We carried out extensive simulation studies to show that these alternative approaches are unbiased and the computation speed can be 16-16 000 times faster than ADP. We applied our approaches to the UK Biobank data of 488 366 study samples with 2492 samples from the 1000 Genomes data as the reference. AP and OADP required 0.82 and 21 CPU hours, respectively, while the projected computation time of ADP was 1628 CPU hours. Furthermore, when inferring sub-European ancestry, SP clearly showed bias, unlike the proposed approaches. AVAILABILITY AND IMPLEMENTATION The OADP and AP methods, as well as SP and ADP, have been implemented in the open-source Python software FRAPOSA, available at github.com/daviddaiweizhang/fraposa. CONTACT leeshawn@umich.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article