Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data.

Huang, Yixuan; Zhang, Peng

Huang, Yixuan; Zhang, Peng.

Afiliación

Huang Y; George Washington University School of Business, Washington, DC, USA.
Zhang P; Division of Immunotherapy and the Director of Bioinformatics Core at the Institute of Human Virology, University of Maryland School of Medicine, MD, USA.

Brief Bioinform ; 22(5)2021 09 02.

Article en En | MEDLINE | ID: mdl-33611343

ABSTRACT

ABSTRACT

Single-cell transcriptomics technologies have vast potential in advancing our understanding of cellular heterogeneity in complex tissues. While methods to interpret single-cell transcriptomics data are developing rapidly, challenges in most analysis pipeline still remain, and the major limitation is a reliance on manual annotations for cell-type identification that is time-consuming, irreproducible, and sometimes lack canonical markers for certain cell types. There is a growing realization of the potential of machine learning models as a supervised classification approach that can significantly aid decision-making processes for cell-type identification. In this work, we performed a comprehensive and impartial evaluation of 10 machine learning models that automatically assign cell phenotypes. The performance of classification methods is estimated by using 20 publicly accessible single-cell RNA sequencing datasets with different sizes, technologies, species and levels of complexity. The performance of each model for within dataset (intra-dataset) and across datasets (inter-dataset) experiments based on the classification accuracy and computation time are both evaluated. Besides, the sensitivity to the number of input features, different annotation levels and dataset complexity was also been estimated. Results showed that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets, while the Linear Support Vector Machine (linear-SVM) and Logistic Regression classifier models have the best overall performance with remarkably fast computation time. Our work provides a guideline for researchers to select and apply suitable machine learning-based classification models in their analysis workflows and sheds some light on the potential direction of future improvement on automated cell phenotype classification tools based on the single-cell sequencing data.

Asunto(s)

Análisis de la Célula Individual/métodos; Máquina de Vectores de Soporte/clasificación; Transcriptoma; Animales; Benchmarking; Encéfalo/metabolismo; Encéfalo/patología; Células Cultivadas; Conjuntos de Datos como Asunto; Humanos; Leucocitos Mononucleares/citología; Leucocitos Mononucleares/metabolismo; Modelos Logísticos; Linfocitos Infiltrantes de Tumor/metabolismo; Linfocitos Infiltrantes de Tumor/patología; Ratones; Páncreas/citología; Páncreas/metabolismo; Fenotipo

Palabras clave

benchmarking; cell identity; classification; machine learning; single-cell RNA sequencing

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Análisis de la Célula Individual / Transcriptoma / Máquina de Vectores de Soporte Tipo de estudio: Diagnostic_studies / Prognostic_studies / Risk_factors_studies Límite: Animals / Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google