Your browser doesn't support javascript.
loading
An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation.
Yang, Qi; Xu, Zhaochun; Zhou, Wenyang; Wang, Pingping; Jiang, Qinghua; Juan, Liran.
Afiliação
  • Yang Q; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Xu Z; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Zhou W; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Wang P; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Jiang Q; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Juan L; School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Brief Bioinform ; 24(4)2023 07 20.
Article em En | MEDLINE | ID: mdl-37225419
Single-cell RNA sequencing (scRNA-seq) detects whole transcriptome signals for large amounts of individual cells and is powerful for determining cell-to-cell differences and investigating the functional characteristics of various cell types. scRNA-seq datasets are usually sparse and highly noisy. Many steps in the scRNA-seq analysis workflow, including reasonable gene selection, cell clustering and annotation, as well as discovering the underlying biological mechanisms from such datasets, are difficult. In this study, we proposed an scRNA-seq analysis method based on the latent Dirichlet allocation (LDA) model. The LDA model estimates a series of latent variables, i.e. putative functions (PFs), from the input raw cell-gene data. Thus, we incorporated the 'cell-function-gene' three-layer framework into scRNA-seq analysis, as this framework is capable of discovering latent and complex gene expression patterns via a built-in model approach and obtaining biologically meaningful results through a data-driven functional interpretation process. We compared our method with four classic methods on seven benchmark scRNA-seq datasets. The LDA-based method performed best in the cell clustering test in terms of both accuracy and purity. By analysing three complex public datasets, we demonstrated that our method could distinguish cell types with multiple levels of functional specialization, and precisely reconstruct cell development trajectories. Moreover, the LDA-based method accurately identified the representative PFs and the representative genes for the cell types/cell stages, enabling data-driven cell cluster annotation and functional interpretation. According to the literature, most of the previously reported marker/functionally relevant genes were recognized.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Perfilação da Expressão Gênica / Análise de Célula Única Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Perfilação da Expressão Gênica / Análise de Célula Única Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China