Your browser doesn't support javascript.
loading
FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model.
Lee, Hanbin; Han, Buhm.
Afiliación
  • Lee H; Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea. Electronic address: hanbin973@snu.ac.kr.
  • Han B; Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea; Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea; Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea; Genealogy Inc., Seoul, Republic of Korea. Electronic address: buhm.han@snu.ac.kr.
Am J Hum Genet ; 109(11): 1974-1985, 2022 11 03.
Article en En | MEDLINE | ID: mdl-36206757
Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: ARN / Análisis de la Célula Individual Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Am J Hum Genet Año: 2022 Tipo del documento: Article

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: ARN / Análisis de la Célula Individual Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Am J Hum Genet Año: 2022 Tipo del documento: Article