A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study.

Liang, Xiao; Cao, Lijie; Chen, Hao; Wang, Lidan; Wang, Yangyun; Fu, Lijuan; Tan, Xiaqin; Chen, Enxiang; Ding, Yubin; Tang, Jing

Liang, Xiao; Cao, Lijie; Chen, Hao; Wang, Lidan; Wang, Yangyun; Fu, Lijuan; Tan, Xiaqin; Chen, Enxiang; Ding, Yubin; Tang, Jing.

Affiliation

Liang X; Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China.
Cao L; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.
Chen H; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.
Wang L; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.
Wang Y; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.
Fu L; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.
Tan X; Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China.
Chen E; Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China.
Ding Y; The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
Tang J; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Brief Bioinform ; 25(1)2023 11 22.

Article in En | MEDLINE | ID: mdl-38168839

ABSTRACT

ABSTRACT

Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.

Subject(s)

Single-Cell Analysis; Transcriptome; Sequence Analysis, RNA/methods; Reproducibility of Results; Single-Cell Analysis/methods; Algorithms; Cluster Analysis; Gene Expression Profiling/methods

Key words

cell identification; clustering algorithms; deep learning; performance evaluation; single-cell RNA sequencing

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Single-Cell Analysis / Transcriptome Type of study: Diagnostic_studies / Prognostic_studies Language: En Journal: Brief Bioinform Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2023 Type: Article Affiliation country: China

Fulltext

XML

PubMed Links

Search on Google