Your browser doesn't support javascript.
loading
Adjusting for Principal Components of Molecular Phenotypes Induces Replicating False Positives.
Dahl, Andy; Guillemot, Vincent; Mefford, Joel; Aschard, Hugues; Zaitlen, Noah.
Afiliação
  • Dahl A; Department of Medicine, University of California San Francisco, 94158 California andywdahl@gmail.com noah.zaitlen@ucsf.edu.
  • Guillemot V; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur, Paris, 75015 France.
  • Mefford J; Department of Medicine, University of California San Francisco, 94158 California.
  • Aschard H; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur, Paris, 75015 France.
  • Zaitlen N; Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, 02115 Massachusetts.
Genetics ; 211(4): 1179-1189, 2019 04.
Article em En | MEDLINE | ID: mdl-30692194
ABSTRACT
High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula see text], standard two-step methods all have [Formula see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Fenótipo / Análise de Componente Principal / Estudo de Associação Genômica Ampla Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Fenótipo / Análise de Componente Principal / Estudo de Associação Genômica Ampla Idioma: En Ano de publicação: 2019 Tipo de documento: Article