PreCanCell: An ensemble learning algorithm for predicting cancer and non-cancer cells from single-cell transcriptomes.

Yang, Tao; Yan, Qiyu; Long, Rongzhuo; Liu, Zhixian; Wang, Xiaosheng

Yang, Tao; Yan, Qiyu; Long, Rongzhuo; Liu, Zhixian; Wang, Xiaosheng.

Afiliación

Yang T; Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
Yan Q; Cancer Genomics Research Center, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
Long R; Big Data Research Institute, China Pharmaceutical University, Nanjing 211198, China.
Liu Z; Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.
Wang X; Cancer Genomics Research Center, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.

Comput Struct Biotechnol J ; 21: 3604-3614, 2023.

Article en En | MEDLINE | ID: mdl-37501705

RESUMEN

We propose PreCanCell, a novel algorithm for predicting malignant and non-malignant cells from single-cell transcriptomes. PreCanCell first identifies the differentially expressed genes (DEGs) between malignant and non-malignant cells commonly in five common cancer types-associated single-cell transcriptome datasets. The five common cancer types include renal cell carcinoma (RCC), head and neck squamous cell carcinoma (HNSCC), melanoma, lung adenocarcinoma (LUAD), and breast cancer (BC). With each of the five datasets as the training set and the DEGs as the features, a single cell is classified as malignant or non-malignant by k-NN (k = 5). Finally, the single cell is determined as malignant or non-malignant by the majority vote of the five k-NN classification results. We tested the predictive performance of PreCanCell in 19 single-cell datasets, and reported classification accuracy, sensitivity, specificity, balanced accuracy (the average of sensitivity and specificity) and the area under the receiver operating characteristic curve (AUROC). In all these datasets, PreCanCell achieved above 0.8 accuracy, sensitivity, specificity, balanced accuracy and AUROC. Finally, we compared the predictive performance of PreCanCell with that of seven other algorithms, including CHETAH, SciBet, SCINA, scmap-cell, scmap-cluster, SingleR, and ikarus. Compared to these algorithms, PreCanCell displays the advantages of higher accuracy and simpler implementation. We have developed an R package for the PreCanCell algorithm, which is available at https://github.com/WangX-Lab/PreCanCell.

Palabras clave

Cancer and non-cancer cells; Ensemble learning algorithm; Non-tumor marker genes; Single-cell transcriptomes; Tumor marker genes

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Comput Struct Biotechnol J Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google