A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq.
Brief Bioinform
; 23(2)2022 03 10.
Article
em En
| MEDLINE
| ID: mdl-35021202
The cell type identification is among the most important tasks in single-cell RNA-sequencing (scRNA-seq) analysis. Many in silico methods have been developed and can be roughly categorized as either supervised or unsupervised. In this study, we investigated the performances of 8 supervised and 10 unsupervised cell type identification methods using 14 public scRNA-seq datasets of different tissues, sequencing protocols and species. We investigated the impacts of a number of factors, including total amount of cells, number of cell types, sequencing depth, batch effects, reference bias, cell population imbalance, unknown/novel cell type, and computational efficiency and scalability. Instead of merely comparing individual methods, we focused on factors' impacts on the general category of supervised and unsupervised methods. We found that in most scenarios, the supervised methods outperformed the unsupervised methods, except for the identification of unknown cell types. This is particularly true when the supervised methods use a reference dataset with high informational sufficiency, low complexity and high similarity to the query dataset. However, such outperformance could be undermined by some undesired dataset properties investigated in this study, which lead to uninformative and biased reference datasets. In these scenarios, unsupervised methods could be comparable to supervised methods. Our study not only explained the cell typing methods' behaviors under different experimental settings but also provided a general guideline for the choice of method according to the scientific goal and dataset properties. Finally, our evaluation workflow is implemented as a modularized R pipeline that allows future evaluation of new methods. Availability: All the source codes are available at https://github.com/xsun28/scRNAIdent.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Perfilação da Expressão Gênica
/
Análise de Célula Única
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
Idioma:
En
Ano de publicação:
2022
Tipo de documento:
Article