Your browser doesn't support javascript.
loading
Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses.
Xue, Angli; Yazar, Seyhan; Neavin, Drew; Powell, Joseph E.
Afiliação
  • Xue A; Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia. a.xue@garvan.org.au.
  • Yazar S; School of Biomedical Sciences, University of New South Wales, Sydney, NSW, 2052, Australia. a.xue@garvan.org.au.
  • Neavin D; Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia.
  • Powell JE; Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia.
Genome Biol ; 24(1): 33, 2023 02 23.
Article em En | MEDLINE | ID: mdl-36823676
ABSTRACT
Using latent variables in gene expression data can help correct unobserved confounders and increase statistical power for expression quantitative trait Loci (eQTL) detection. The probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA) are widely used methods that can remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, their performance has not been evaluated extensively in single-cell eQTL analysis, especially for different cell types. Potential challenges arise due to the structure of single-cell RNA-seq data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that PEER and PCA require additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid latent variables; otherwise, it can result in highly correlated factors (Pearson's correlation r = 0.63 ~ 0.99). Incorporating valid PFs/PCs in the eQTL association model would identify 1.7 ~ 13.3% more eGenes. Sensitivity analysis showed that the pattern of change between the number of eGenes detected and fitted PFs/PCs varied significantly in different cell types. In addition, using highly variable genes to generate latent variables could achieve similar eGenes discovery power as using all genes but save considerable computational resources (~ 6.2-fold faster).
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Locos de Características Quantitativas / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Locos de Características Quantitativas / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article