Your browser doesn't support javascript.
loading
Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data.
Do, Van Hoan; Rojas Ringeling, Francisca; Canzar, Stefan.
Affiliation
  • Do VH; Gene Center, Ludwig-Maximilians-Universität München, 81377 Munich, Germany.
  • Rojas Ringeling F; Gene Center, Ludwig-Maximilians-Universität München, 81377 Munich, Germany.
  • Canzar S; Gene Center, Ludwig-Maximilians-Universität München, 81377 Munich, Germany.
Genome Res ; 31(4): 677-688, 2021 04.
Article in En | MEDLINE | ID: mdl-33627473
A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Cluster Analysis / Gene Expression Profiling / Single-Cell Analysis / RNA-Seq Language: En Journal: Genome Res Journal subject: BIOLOGIA MOLECULAR / GENETICA Year: 2021 Document type: Article Affiliation country: Germany Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Cluster Analysis / Gene Expression Profiling / Single-Cell Analysis / RNA-Seq Language: En Journal: Genome Res Journal subject: BIOLOGIA MOLECULAR / GENETICA Year: 2021 Document type: Article Affiliation country: Germany Country of publication: United States