Your browser doesn't support javascript.
loading
A robust semi-supervised NMF model for single cell RNA-seq data.
Wu, Peng; An, Mo; Zou, Hai-Ren; Zhong, Cai-Ying; Wang, Wei; Wu, Chang-Peng.
  • Wu P; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
  • An M; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
  • Zou HR; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
  • Zhong CY; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
  • Wang W; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
  • Wu CP; Department of Neurosurgery, The People's Hospital of Longhua District, Shenzhen, Guangdong Province, China.
PeerJ ; 8: e10091, 2020.
Article en En | MEDLINE | ID: mdl-33088619
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. METHODS: We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. RESULTS: Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data.
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Article