Pesquisa | Biblioteca Virtual em Saúde Fiocruz

scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding.

Wu, Hao; Wu, Yingfu; Jiang, Yuhong; Zhou, Bing; Zhou, Haoru; Chen, Zhongli; Xiong, Yi; Liu, Quanzhong; Zhang, Hongming.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34553746

RESUMO

Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.

Assuntos

Algoritmos , Software , Humanos , Aprendizado de Máquina

CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types.

Zhang, Pengyu; Wu, Yingfu; Zhou, Haoru; Zhou, Bing; Zhang, Hongming; Wu, Hao.

Bioinformatics ; 38(19): 4497-4504, 2022 09 30.

Artigo em Inglês | MEDLINE | ID: mdl-35997565

RESUMO

MOTIVATION: Three-dimensional (3D) genome organization is of vital importance in gene regulation and disease mechanisms. Previous studies have shown that CTCF-mediated chromatin loops are crucial to studying the 3D structure of cells. Although various experimental techniques have been developed to detect chromatin loops, they have been found to be time-consuming and costly. Nowadays, various sequence-based computational methods can capture significant features of 3D genome organization and help predict chromatin loops. However, these methods have low performance and poor generalization ability in predicting chromatin loops. RESULTS: Here, we propose a novel deep learning model, called CLNN-loop, to predict chromatin loops in different cell lines and CTCF-binding sites (CBS) pair types by fusing multiple sequence-based features. The analysis of a series of examinations based on the datasets in the previous study shows that CLNN-loop has satisfactory performance and is superior to the existing methods in terms of predicting chromatin loops. In addition, we apply the SHAP framework to interpret the predictions of different models, and find that CTCF motif and sequence conservation are important signs of chromatin loops in different cell lines and CBS pair types. AVAILABILITY AND IMPLEMENTATION: The source code of CLNN-loop is freely available at https://github.com/HaoWuLab-Bioinformatics/CLNN-loop and the webserver of CLNN-loop is freely available at http://hwclnn.sdu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Cromatina , Aprendizado Profundo , Fator de Ligação a CCCTC/metabolismo , Sítios de Ligação , Linhagem Celular

SCMcluster: a high-precision cell clustering algorithm integrating marker gene set with single-cell RNA sequencing data.

Wu, Hao; Zhou, Haoru; Zhou, Bing; Wang, Meili.

Brief Funct Genomics ; 22(4): 329-340, 2023 07 17.

Artigo em Inglês | MEDLINE | ID: mdl-36848584

RESUMO

Single-cell clustering is the most significant part of single-cell RNA sequencing (scRNA-seq) data analysis. One main issue facing the scRNA-seq data is noise and sparsity, which poses a great challenge for the advance of high-precision clustering algorithms. This study adopts cellular markers to identify differences between cells, which contributes to feature extraction of single cells. In this work, we propose a high-precision single-cell clustering algorithm-SCMcluster (single-cell cluster using marker genes). This algorithm integrates two cell marker databases(CellMarker database and PanglaoDB database) with scRNA-seq data for feature extraction and constructs an ensemble clustering model based on the consensus matrix. We test the efficiency of this algorithm and compare it with other eight popular clustering algorithms on two scRNA-seq datasets derived from human and mouse tissues, respectively. The experimental results show that SCMcluster outperforms the existing methods in both feature extraction and clustering performance. The source code of SCMcluster is available for free at https://github.com/HaoWuLab-Bioinformatics/SCMcluster.

Assuntos

Perfilação da Expressão Gênica , Análise de Célula Única , Animais , Humanos , Camundongos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados

Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning.

Wu, Hao; Zhou, Bing; Zhou, Haoru; Zhang, Pengyu; Wang, Meili.

Brief Funct Genomics ; 22(5): 475-484, 2023 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-37133976

RESUMO

The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.

Assuntos

Cromatina , Cromossomos , Mapeamento Cromossômico , Redes Neurais de Computação , Aprendizado de Máquina

Identifying driver modules based on multi-omics biological networks in prostate cancer.

Chen, Zhongli; Liang, Biting; Wu, Yingfu; Zhou, Haoru; Wang, Yuchen; Wu, Hao.

IET Syst Biol ; 16(6): 187-200, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36039671

RESUMO

The development of sequencing technology has promoted the expansion of cancer genome data. It is necessary to identify the pathogenesis of cancer at the molecular level and explore reliable treatment methods and precise drug targets in cancer by identifying carcinogenic functional modules in massive multi-omics data. However, there are still limitations to identifying carcinogenic driver modules by utilising genetic characteristics simply. Therefore, this study proposes a computational method, NetAP, to identify driver modules in prostate cancer. Firstly, high mutual exclusivity, high coverage, and high topological similarity between genes are integrated to construct a weight function, which calculates the weight of gene pairs in a biological network. Secondly, the random walk method is utilised to reevaluate the strength of interaction among genes. Finally, the optimal driver modules are identified by utilising the affinity propagation algorithm. According to the results, the authors' method identifies more validated driver genes and driver modules compared with the other previous methods. Thus, the proposed NetAP method can identify carcinogenic driver modules effectively and reliably, and the experimental results provide a powerful basis for cancer diagnosis, treatment and drug targets.

Assuntos

Neoplasias da Próstata , Masculino , Humanos , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA