A parameter-free deep embedded clustering method for single-cell RNA-seq data.

Zeng, Yuansong; Wei, Zhuoyi; Zhong, Fengqi; Pan, Zixiang; Lu, Yutong; Yang, Yuedong

Zeng, Yuansong; Wei, Zhuoyi; Zhong, Fengqi; Pan, Zixiang; Lu, Yutong; Yang, Yuedong.

Afiliação

Zeng Y; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.
Wei Z; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.
Zhong F; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.
Pan Z; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.
Lu Y; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.
Yang Y; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.

Brief Bioinform ; 23(5)2022 09 20.

Article em En | MEDLINE | ID: mdl-35524494

ABSTRACT

ABSTRACT

Clustering analysis is widely used in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While many clustering methods have been developed for scRNA-seq analysis, most of these methods require to provide the number of clusters. However, it is not easy to know the exact number of cell types in advance, and experienced determination is not always reliable. Here, we have developed ADClust, an automatic deep embedding clustering method for scRNA-seq data, which can accurately cluster cells without requiring a predefined number of clusters. Specifically, ADClust first obtains low-dimensional representation through pre-trained autoencoder and uses the representations to cluster cells into initial micro-clusters. The clusters are then compared in between by a statistical test, and similar micro-clusters are merged into larger clusters. According to the clustering, cell representations are updated so that each cell will be pulled toward centers of its assigned cluster and similar clusters, while cells are separated to keep distances between clusters. This is accomplished through jointly optimizing the carefully designed clustering and autoencoder loss functions. This merging process continues until convergence. ADClust was tested on 11 real scRNA-seq datasets and was shown to outperform existing methods in terms of both clustering performance and the accuracy on the number of the determined clusters. More importantly, our model provides high speed and scalability for large datasets.

Assuntos

RNA; Análise de Célula Única; Algoritmos; Análise por Conglomerados; Perfilação da Expressão Gênica/métodos; RNA/genética; RNA-Seq; Análise de Sequência de RNA/métodos; Análise de Célula Única/métodos

Palavras-chave

deep embedded clustering; dip-test; estimating the number of cell clusters; single-cell RNA sequencing; single-cell clustering

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: RNA / Análise de Célula Única Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google