Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data.
Brief Bioinform
; 22(5)2021 09 02.
Article
en En
| MEDLINE
| ID: mdl-33611343
ABSTRACT
Single-cell transcriptomics technologies have vast potential in advancing our understanding of cellular heterogeneity in complex tissues. While methods to interpret single-cell transcriptomics data are developing rapidly, challenges in most analysis pipeline still remain, and the major limitation is a reliance on manual annotations for cell-type identification that is time-consuming, irreproducible, and sometimes lack canonical markers for certain cell types. There is a growing realization of the potential of machine learning models as a supervised classification approach that can significantly aid decision-making processes for cell-type identification. In this work, we performed a comprehensive and impartial evaluation of 10 machine learning models that automatically assign cell phenotypes. The performance of classification methods is estimated by using 20 publicly accessible single-cell RNA sequencing datasets with different sizes, technologies, species and levels of complexity. The performance of each model for within dataset (intra-dataset) and across datasets (inter-dataset) experiments based on the classification accuracy and computation time are both evaluated. Besides, the sensitivity to the number of input features, different annotation levels and dataset complexity was also been estimated. Results showed that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets, while the Linear Support Vector Machine (linear-SVM) and Logistic Regression classifier models have the best overall performance with remarkably fast computation time. Our work provides a guideline for researchers to select and apply suitable machine learning-based classification models in their analysis workflows and sheds some light on the potential direction of future improvement on automated cell phenotype classification tools based on the single-cell sequencing data.
Palabras clave
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Análisis de la Célula Individual
/
Transcriptoma
/
Máquina de Vectores de Soporte
Tipo de estudio:
Diagnostic_studies
/
Prognostic_studies
/
Risk_factors_studies
Límite:
Animals
/
Humans
Idioma:
En
Revista:
Brief Bioinform
Asunto de la revista:
BIOLOGIA
/
INFORMATICA MEDICA
Año:
2021
Tipo del documento:
Article
País de afiliación:
Estados Unidos