CDSKNN<sup>XMBD</sup>: a novel clustering framework for large-scale single-cell data based on a stable graph structure.

Ren, Jun; Lyu, Xuejing; Guo, Jintao; Shi, Xiaodong; Zhou, Ying; Li, Qiyuan

CDSKNN^XMBD: a novel clustering framework for large-scale single-cell data based on a stable graph structure.

Ren, Jun; Lyu, Xuejing; Guo, Jintao; Shi, Xiaodong; Zhou, Ying; Li, Qiyuan.

Afiliação

Ren J; School of Informatics, Xiamen University, Xiamen, 361105, China.
Lyu X; Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, 361102, China.
Guo J; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, 361102, China.
Shi X; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, 361102, China.
Zhou Y; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, 361102, China.
Li Q; School of Informatics, Xiamen University, Xiamen, 361105, China.

J Transl Med ; 22(1): 233, 2024 03 03.

Article em En | MEDLINE | ID: mdl-38433205

ABSTRACT

ABSTRACT

BACKGROUND:

Accurate and efficient cell grouping is essential for analyzing single-cell transcriptome sequencing (scRNA-seq) data. However, the existing clustering techniques often struggle to provide timely and accurate cell type groupings when dealing with datasets with large-scale or imbalanced cell types. Therefore, there is a need for improved methods that can handle the increasing size of scRNA-seq datasets while maintaining high accuracy and efficiency.

METHODS:

We propose CDSKNNXMBD (Community Detection based on a Stable K-Nearest Neighbor Graph Structure), a novel single-cell clustering framework integrating partition clustering algorithm and community detection algorithm, which achieves accurate and fast cell type grouping by finding a stable graph structure.

RESULTS:

We evaluated the effectiveness of our approach by analyzing 15 tissues from the human fetal atlas. Compared to existing methods, CDSKNN effectively counteracts the high imbalance in single-cell data, enabling effective clustering. Furthermore, we conducted comparisons across multiple single-cell datasets from different studies and sequencing techniques. CDSKNN is of high applicability and robustness, and capable of balancing the complexities of across diverse types of data. Most importantly, CDSKNN exhibits higher operational efficiency on datasets at the million-cell scale, requiring an average of only 6.33 min for clustering 1.46 million single cells, saving 33.3% to 99% of running time compared to those of existing methods.

CONCLUSIONS:

The CDSKNN is a flexible, resilient, and promising clustering tool that is particularly suitable for clustering imbalanced data and demonstrates high efficiency on large-scale scRNA-seq datasets.

Assuntos

Algoritmos; Humanos; Análise por Conglomerados

Palavras-chave

Clustering; Imbalance ratio; Large-scale; scRNA-seq

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Limite: Humans Idioma: En Revista: J Transl Med Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google