Búsqueda | BVS Bolivia

LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering.

Wu, Sha-Sha; Hou, Mi-Xiao; Feng, Chun-Mei; Liu, Jin-Xing.

Int J Mol Sci ; 20(4)2019 Feb 18.

Artículo en Inglés | MEDLINE | ID: mdl-30781701

RESUMEN

Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.

Asunto(s)

Algoritmos , Análisis por Conglomerados , Neoplasias del Colon/genética , Bases de Datos como Asunto , Neoplasias Esofágicas/genética , Perfilación de la Expresión Génica , Humanos , Análisis de Regresión

PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data.

Feng, Chun-Mei; Xu, Yong; Hou, Mi-Xiao; Dai, Ling-Yun; Shang, Jun-Liang.

BMC Bioinformatics ; 20(Suppl 22): 716, 2019 Dec 30.

Artículo en Inglés | MEDLINE | ID: mdl-31888433

RESUMEN

BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.

Asunto(s)

Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Análisis de Componente Principal , Análisis por Conglomerados , Expresión Génica , Humanos , Neoplasias/genética , Mapas de Interacción de Proteínas

A new method for mining information of co-expression network based on multi-cancers integrated data.

Hou, Mi-Xiao; Gao, Ying-Lian; Liu, Jin-Xing; Shang, Junliang; Zhu, Rong; Yuan, Sha-Sha.

BMC Med Genomics ; 12(Suppl 7): 155, 2019 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-31888692

RESUMEN

BACKGROUND: Gene co-expression network is a favorable method to reveal the nature of disease. With the development of cancer, the way to build gene co-expression networks based on cancer data has been become a hot spot. However, there are still a limited number of current node measurement methods and node mining strategies for multi-cancers network construction. METHODS: In this paper, we introduce a new method for mining information of co-expression network based on multi-cancers integrated data, named PMN. We construct the network by combining the different types of relevant measures (linear and nonlinear rules) for different nodes based on integrated gene expression data of multi-cancers from The Cancer Genome Atlas (TCGA). For mining genes, we combine different properties (local and global characteristics) of the nodes. RESULTS: We uncover more suspicious abnormally expressed genes and shared pathways of different cancers. And we have also found some proven genes and pathways; of course, there are some suspicious factors and molecules that need clinical validation. CONCLUSIONS: The results demonstrate that our method is very effective in excavating gene co-expression genes of multi-cancers.

Asunto(s)

Minería de Datos , Bases de Datos Genéticas , Redes Reguladoras de Genes , Neoplasias/genética , Genes Relacionados con las Neoplasias , Humanos

Network analysis based on low-rank method for mining information on integrated data of multi-cancers.

Hou, Mi-Xiao; Gao, Ying-Lian; Liu, Jin-Xing; Dai, Ling-Yun; Kong, Xiang-Zhen; Shang, Junliang.

Comput Biol Chem ; 78: 468-473, 2019 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-30563751

RESUMEN

The noise problem of cancer sequencing data has been a problem that can't be ignored. Utilizing considerable way to reduce noise of these cancer data is an important issue in the analysis of gene co-expression network. In this paper, we apply a sparse and low-rank method which is Robust Principal Component Analysis (RPCA) to solve the noise problem for integrated data of multi-cancers from The Cancer Genome Atlas (TCGA). And then we build the gene co-expression network based on the integrated data after noise reduction. Finally, we perform nodes and pathways mining on the denoising networks. Experiments in this paper show that after denoising by RPCA, the gene expression data tend to be orderly and neat than before, and the constructed networks contain more pathway enrichment information than unprocessed data. Moreover, learning from the betweenness centrality of the nodes in the network, we find some abnormally expressed genes and pathways proven that are associated with many cancers from the denoised network. The experimental results indicate that our method is reasonable and effective, and we also find some candidate suspicious genes that may be linked to multi-cancers.

Asunto(s)

Minería de Datos , Redes Reguladoras de Genes/genética , Neoplasias/genética , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Análisis de Componente Principal

Robust and Efficient Biomolecular Clustering of Tumor Based on ${p}$ -Norm Singular Value Decomposition.

Kong, Xiang-Zhen; Liu, Jin-Xing; Zheng, Chun-Hou; Hou, Mi-Xiao; Wang, Juan.

IEEE Trans Nanobioscience ; 16(5): 341-348, 2017 07.

Artículo en Inglés | MEDLINE | ID: mdl-28541216

RESUMEN

High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.

Asunto(s)

Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , Neoplasias , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Neoplasias/genética , Neoplasias/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA