Your browser doesn't support javascript.
loading
A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.
Lakkis, Justin; Wang, David; Zhang, Yuanchao; Hu, Gang; Wang, Kui; Pan, Huize; Ungar, Lyle; Reilly, Muredach P; Li, Xiangjie; Li, Mingyao.
Afiliación
  • Lakkis J; Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
  • Wang D; Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
  • Zhang Y; Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
  • Hu G; School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin 300071, China.
  • Wang K; Department of Information Theory and Data Science, School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China.
  • Pan H; Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York 10032, USA.
  • Ungar L; Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
  • Reilly MP; Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, New York 10032, USA.
  • Li X; School of Statistics and Data Science, Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin, Nankai University, Tianjin 300071, China.
  • Li M; Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Genome Res ; 31(10): 1753-1766, 2021 10.
Article en En | MEDLINE | ID: mdl-34035047
ABSTRACT
Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Transcriptoma / Aprendizaje Profundo Tipo de estudio: Prognostic_studies Idioma: En Revista: Genome Res Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Transcriptoma / Aprendizaje Profundo Tipo de estudio: Prognostic_studies Idioma: En Revista: Genome Res Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos