Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 7 de 7
Filtrer
1.
Article de Anglais | MEDLINE | ID: mdl-36912759

RÉSUMÉ

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.


Sujet(s)
Analyse sur cellule unique , Analyse de l'expression du gène de la cellule unique , Analyse en composantes principales , Analyse sur cellule unique/méthodes , Algorithmes , Analyse de regroupements
2.
J Comput Biol ; 30(8): 848-860, 2023 08.
Article de Anglais | MEDLINE | ID: mdl-37471220

RÉSUMÉ

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.


Sujet(s)
Algorithmes , ARN , Analyse de regroupements , Analyse sur cellule unique/méthodes , Analyse de séquence d'ARN , Analyse de profil d'expression de gènes
3.
IEEE J Biomed Health Inform ; 27(10): 5199-5209, 2023 10.
Article de Anglais | MEDLINE | ID: mdl-37506010

RÉSUMÉ

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.


Sujet(s)
Algorithmes , Analyse sur cellule unique , Humains , Analyse de regroupements , Analyse de profil d'expression de gènes/méthodes
4.
Comput Biol Chem ; 104: 107862, 2023 Jun.
Article de Anglais | MEDLINE | ID: mdl-37031647

RÉSUMÉ

Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.


Sujet(s)
Algorithmes , Analyse de regroupements
5.
BMC Bioinformatics ; 20(Suppl 22): 716, 2019 Dec 30.
Article de Anglais | MEDLINE | ID: mdl-31888433

RÉSUMÉ

BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.


Sujet(s)
Algorithmes , Bases de données génétiques , Analyse de profil d'expression de gènes , Régulation de l'expression des gènes tumoraux , Analyse en composantes principales , Analyse de regroupements , Expression des gènes , Humains , Tumeurs/génétique , Cartes d'interactions protéiques
6.
BMC Syst Biol ; 11(Suppl 6): 119, 2017 12 14.
Article de Anglais | MEDLINE | ID: mdl-29297378

RÉSUMÉ

BACKGROUND: Traditional drug identification methods follow the "one drug-one target" thought. But those methods ignore the natural characters of human diseases. To overcome this limitation, many identification methods of drug-pathway association pairs have been developed, such as the integrative penalized matrix decomposition (iPaD) method. The iPaD method imposes the L1-norm penalty on the regularization term. However, lasso-type penalties have an obvious disadvantage, that is, the sparsity produced by them is too dispersive. RESULTS: Therefore, to improve the performance of the iPaD method, we propose a novel method named L2,1-iPaD to identify paired drug-pathway associations. In the L2,1-iPaD model, we use the L2,1-norm penalty to replace the L1-norm penalty since the L2,1-norm penalty can produce row sparsity. CONCLUSIONS: By applying the L2,1-iPaD method to the CCLE and NCI-60 datasets, we demonstrate that the performance of L2,1-iPaD method is superior to existing methods. And the proposed method can achieve better enrichment in terms of discovering validated drug-pathway association pairs than the iPaD method by performing permutation test. The results on the two real datasets prove that our method is effective.


Sujet(s)
Découverte de médicament/méthodes , Algorithmes , Biologie informatique , Jeux de données comme sujet , Humains , Modèles théoriques
7.
Comput Biol Chem ; 65: 185-192, 2016 12.
Article de Anglais | MEDLINE | ID: mdl-27693191

RÉSUMÉ

With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data. In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods. The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.


Sujet(s)
Régulation de l'expression des gènes , Modèles théoriques
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...