Búsqueda | BVS CLAP/SMR-OPS/OMS

Clustering single-cell RNA-seq data by rank constrained similarity learning.

Mei, Qinglin; Li, Guojun; Su, Zhengchang.

Bioinformatics ; 37(19): 3235-3242, 2021 Oct 11.

Artículo en Inglés | MEDLINE | ID: mdl-33961003

RESUMEN

MOTIVATION: Recent breakthroughs of single-cell RNA sequencing (scRNA-seq) technologies offer an exciting opportunity to identify heterogeneous cell types in complex tissues. However, the unavoidable biological noise and technical artifacts in scRNA-seq data as well as the high dimensionality of expression vectors make the problem highly challenging. Consequently, although numerous tools have been developed, their accuracy remains to be improved. RESULTS: Here, we introduce a novel clustering algorithm and tool RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both local similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman's rank correlations of a cell's expression vector with those of other cells to measure its global similarity, and adaptively learns neighbor representation of a cell as its local similarity. The overall similarity of a cell to other cells is a linear combination of its global similarity and local similarity. RCSL automatically estimates the number of cell types defined in the similarity matrix, and identifies them by constructing a block-diagonal matrix, such that its distance to the similarity matrix is minimized. Each block-diagonal submatrix is a cell cluster/type, corresponding to a connected component in the cognate similarity graph. When tested on 16 benchmark scRNA-seq datasets in which the cell types are well-annotated, RCSL substantially outperformed six state-of-the-art methods in accuracy and robustness as measured by three metrics. AVAILABILITY AND IMPLEMENTATION: The RCSL algorithm is implemented in R and can be freely downloaded at https://cran.r-project.org/web/packages/RCSL/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

scQA: A dual-perspective cell type identification model for single cell transcriptome data.

Li, Di; Mei, Qinglin; Li, Guojun.

Comput Struct Biotechnol J ; 23: 520-536, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38235363

RESUMEN

Single-cell RNA sequencing technologies have been pivotal in advancing the development of algorithms for clustering heterogeneous cell populations. Existing methods for utilizing scRNA-seq data to identify cell types tend to neglect the beneficial impact of dropout events and perform clustering focusing solely on quantitative perspective. Here, we introduce a novel method named scQA, notable for its ability to concurrently identify cell types and cell type-specific key genes from both qualitative and quantitative perspectives. In contrast to other methods, scQA not only identifies cell types but also extracts key genes associated with these cell types, enabling bidirectional clustering for scRNA-seq data. Through an iterative process, our approach aims to minimize the number of landmarks to approximately a dozen while maximizing the inclusion of quasi-trend-preserved genes with dropouts both qualitatively and quantitatively. It then clusters cells by employing an ingenious label propagation strategy, obviating the requirement for a predetermined number of cell types. Validated on 20 publicly available scRNA-seq datasets, scQA consistently outperforms other salient tools. Furthermore, we confirm the effectiveness and potential biological significance of the identified key genes through both external and internal validation. In conclusion, scQA emerges as a valuable tool for investigating cell heterogeneity due to its distinctive fusion of qualitative and quantitative facets, along with bidirectional clustering capabilities. Furthermore, it can be seamlessly integrated into border scRNA-seq analyses. The source codes are publicly available at https://github.com/LD-Lyndee/scQA.

DriverMP enables improved identification of cancer driver genes.

Liu, Yangyang; Han, Jiyun; Kong, Tongxin; Xiao, Nannan; Mei, Qinglin; Liu, Juntao.

Gigascience ; 122022 Dec 28.

Artículo en Inglés | MEDLINE | ID: mdl-38091511

RESUMEN

BACKGROUND: Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. FINDINGS: We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, proteinâprotein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. CONCLUSIONS: The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA