Pesquisa | BVS Doenças Infecciosas e Parasitárias

FEED: a feature selection method based on gene expression decomposition for single cell clustering.

Zhang, Chao; Duan, Zhi-Wei; Xu, Yun-Pei; Liu, Jin; Li, Hong-Dong.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37935617

RESUMO

Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.

Assuntos

Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Expressão Gênica

REBET: a method to determine the number of cell clusters based on batch effect removal.

Fang, Zhao-Yu; Lin, Cui-Xiang; Xu, Yun-Pei; Li, Hong-Dong; Xu, Qing-Song.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34131702

RESUMO

In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).

Assuntos

Análise por Conglomerados , RNA-Seq/métodos , Análise de Célula Única/métodos , Algoritmos , Bases de Dados Genéticas , Reprodutibilidade dos Testes

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest.

Zhao, Yuan; Fang, Zhao-Yu; Lin, Cui-Xiang; Deng, Chao; Xu, Yun-Pei; Li, Hong-Dong.

Front Genet ; 12: 665843, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34386033

RESUMO

In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA