Your browser doesn't support javascript.
loading
scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding.
Wu, Hao; Wu, Yingfu; Jiang, Yuhong; Zhou, Bing; Zhou, Haoru; Chen, Zhongli; Xiong, Yi; Liu, Quanzhong; Zhang, Hongming.
Afiliación
  • Wu H; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Wu Y; School of Software, Shandong University, Jinan, 250101, Shandong, China.
  • Jiang Y; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Zhou B; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Zhou H; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Chen Z; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Xiong Y; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Liu Q; State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China.
  • Zhang H; College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.
Brief Bioinform ; 23(1)2022 01 17.
Article en En | MEDLINE | ID: mdl-34553746
ABSTRACT
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http//hww.sdu.edu.cn8002/scHiCStackL/ and https//github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Programas Informáticos Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Programas Informáticos Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: China