Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
BMC Bioinformatics ; 20(Suppl 22): 716, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888433

RESUMO

BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.


Assuntos
Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Análise de Componente Principal , Análise por Conglomerados , Expressão Gênica , Humanos , Neoplasias/genética , Mapas de Interação de Proteínas
2.
Int J Mol Sci ; 20(4)2019 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-30781701

RESUMO

Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.


Assuntos
Algoritmos , Análise por Conglomerados , Neoplasias do Colo/genética , Bases de Dados como Assunto , Neoplasias Esofágicas/genética , Perfilação da Expressão Gênica , Humanos , Análise de Regressão
3.
BMC Med Genomics ; 12(Suppl 7): 155, 2019 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888692

RESUMO

BACKGROUND: Gene co-expression network is a favorable method to reveal the nature of disease. With the development of cancer, the way to build gene co-expression networks based on cancer data has been become a hot spot. However, there are still a limited number of current node measurement methods and node mining strategies for multi-cancers network construction. METHODS: In this paper, we introduce a new method for mining information of co-expression network based on multi-cancers integrated data, named PMN. We construct the network by combining the different types of relevant measures (linear and nonlinear rules) for different nodes based on integrated gene expression data of multi-cancers from The Cancer Genome Atlas (TCGA). For mining genes, we combine different properties (local and global characteristics) of the nodes. RESULTS: We uncover more suspicious abnormally expressed genes and shared pathways of different cancers. And we have also found some proven genes and pathways; of course, there are some suspicious factors and molecules that need clinical validation. CONCLUSIONS: The results demonstrate that our method is very effective in excavating gene co-expression genes of multi-cancers.


Assuntos
Mineração de Dados , Bases de Dados Genéticas , Redes Reguladoras de Genes , Neoplasias/genética , Genes Neoplásicos , Humanos
4.
Comput Biol Chem ; 78: 468-473, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30563751

RESUMO

The noise problem of cancer sequencing data has been a problem that can't be ignored. Utilizing considerable way to reduce noise of these cancer data is an important issue in the analysis of gene co-expression network. In this paper, we apply a sparse and low-rank method which is Robust Principal Component Analysis (RPCA) to solve the noise problem for integrated data of multi-cancers from The Cancer Genome Atlas (TCGA). And then we build the gene co-expression network based on the integrated data after noise reduction. Finally, we perform nodes and pathways mining on the denoising networks. Experiments in this paper show that after denoising by RPCA, the gene expression data tend to be orderly and neat than before, and the constructed networks contain more pathway enrichment information than unprocessed data. Moreover, learning from the betweenness centrality of the nodes in the network, we find some abnormally expressed genes and pathways proven that are associated with many cancers from the denoised network. The experimental results indicate that our method is reasonable and effective, and we also find some candidate suspicious genes that may be linked to multi-cancers.


Assuntos
Mineração de Dados , Redes Reguladoras de Genes/genética , Neoplasias/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Análise de Componente Principal
5.
IEEE Trans Nanobioscience ; 16(5): 341-348, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28541216

RESUMO

High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Neoplasias , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Neoplasias/genética , Neoplasias/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA