D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data.

Liu, Guoyun; Li, Manzhi; Wang, Hongtao; Lin, Shijun; Xu, Junlin; Li, Ruixi; Tang, Min; Li, Chun

Liu, Guoyun; Li, Manzhi; Wang, Hongtao; Lin, Shijun; Xu, Junlin; Li, Ruixi; Tang, Min; Li, Chun.

Afiliação

Liu G; School of Mathematics and Statistics, Hainan Normal University, Haikou, China.
Li M; School of Mathematics and Statistics, Hainan Normal University, Haikou, China.
Wang H; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China.
Lin S; School of Mathematics and Statistics, Hainan Normal University, Haikou, China.
Xu J; School of Mathematics and Statistics, Hainan Normal University, Haikou, China.
Li R; College of Information Science and Engineering, Hunan University, Changsha, China.
Tang M; Geneis Beijing Co., Ltd., Beijing, China.
Li C; School of Life Sciences, Jiangsu University, Zhenjiang, China.

Front Genet ; 13: 912711, 2022.

Article em En | MEDLINE | ID: mdl-35846121

ABSTRACT

ABSTRACT

A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm.

Palavras-chave

Dissimilarity matrix; K-means; ScRNA-seq; density; dynamic radius

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Front Genet Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google