Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
BMC Bioinformatics ; 25(1): 169, 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38684942

RESUMO

Many important biological facts have been found as single-cell RNA sequencing (scRNA-seq) technology has advanced. With the use of this technology, it is now possible to investigate the connections among individual cells, genes, and illnesses. For the analysis of single-cell data, clustering is frequently used. Nevertheless, biological data usually contain a large amount of noise data, and traditional clustering methods are sensitive to noise. However, acquiring higher-order spatial information from the data alone is insufficient. As a result, getting trustworthy clustering findings is challenging. We propose the Cauchy hyper-graph Laplacian non-negative matrix factorization (CHLNMF) as a unique approach to address these issues. In CHLNMF, we replace the measurement based on Euclidean distance in the conventional non-negative matrix factorization (NMF), which can lessen the influence of noise, with the Cauchy loss function (CLF). The model also incorporates the hyper-graph constraint, which takes into account the high-order link among the samples. The CHLNMF model's best solution is then discovered using a half-quadratic optimization approach. Finally, using seven scRNA-seq datasets, we contrast the CHLNMF technique with the other nine top methods. The validity of our technique was established by analysis of the experimental outcomes.


Assuntos
Algoritmos , Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , Análise por Conglomerados , Biologia Computacional/métodos
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34607360

RESUMO

Learning node representation is a fundamental problem in biological network analysis, as compact representation features reveal complicated network structures and carry useful information for downstream tasks such as link prediction and node classification. Recently, multiple networks that profile objects from different aspects are increasingly accumulated, providing the opportunity to learn objects from multiple perspectives. However, the complex common and specific information across different networks pose challenges to node representation methods. Moreover, ubiquitous noise in networks calls for more robust representation. To deal with these problems, we present a representation learning method for multiple biological networks. First, we accommodate the noise and spurious edges in networks using denoised diffusion, providing robust connectivity structures for the subsequent representation learning. Then, we introduce a graph regularized integration model to combine refined networks and compute common representation features. By using the regularized decomposition technique, the proposed model can effectively preserve the common structural property of different networks and simultaneously accommodate their specific information, leading to a consistent representation. A simulation study shows the superiority of the proposed method on different levels of noisy networks. Three network-based inference tasks, including drug-target interaction prediction, gene function identification and fine-grained species categorization, are conducted using representation features learned from our method. Biological networks at different scales and levels of sparsity are involved. Experimental results on real-world data show that the proposed method has robust performance compared with alternatives. Overall, by eliminating noise and integrating effectively, the proposed method is able to learn useful representations from multiple biological networks.


Assuntos
Aprendizagem , Redes Neurais de Computação , Simulação por Computador , Difusão
3.
J Transl Med ; 20(1): 552, 2022 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-36463215

RESUMO

BACKGROUND: Associations of drugs with diseases provide important information for expediting drug development. Due to the number of known drug-disease associations is still insufficient, and considering that inferring associations between them through traditional in vitro experiments is time-consuming and costly. Therefore, more accurate and reliable computational methods urgent need to be developed to predict potential associations of drugs with diseases. METHODS: In this study, we present the model called weighted graph regularized collaborative non-negative matrix factorization for drug-disease association prediction (WNMFDDA). More specifically, we first calculated the drug similarity and disease similarity based on the chemical structures of drugs and medical description information of diseases, respectively. Then, to extend the model to work for new drugs and diseases, weighted [Formula: see text] nearest neighbor was used as a preprocessing step to reconstruct the interaction score profiles of drugs with diseases. Finally, a graph regularized non-negative matrix factorization model was used to identify potential associations between drug and disease. RESULTS: During the cross-validation process, WNMFDDA achieved the AUC values of 0.939 and 0.952 on Fdataset and Cdataset under ten-fold cross validation, respectively, which outperforms other competing prediction methods. Moreover, case studies for several drugs and diseases were carried out to further verify the predictive performance of WNMFDDA. As a result, 13(Doxorubicin), 13(Amiodarone), 12(Obesity) and 12(Asthma) of the top 15 corresponding candidate diseases or drugs were confirmed by existing databases. CONCLUSIONS: The experimental results adequately demonstrated that WNMFDDA is a very effective method for drug-disease association prediction. We believe that WNMFDDA is helpful for relevant biomedical researchers in follow-up studies.


Assuntos
Algoritmos , Asma , Humanos , Análise por Conglomerados , Bases de Dados Factuais , Projetos de Pesquisa
4.
Neuroimage ; 238: 118200, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34118398

RESUMO

We propose a novel optimization framework that integrates imaging and genetics data for simultaneous biomarker identification and disease classification. The generative component of our model uses a dictionary learning framework to project the imaging and genetic data into a shared low dimensional space. We have coupled both the data modalities by tying the linear projection coefficients to the same latent space. The discriminative component of our model uses logistic regression on the projection vectors for disease diagnosis. This prediction task implicitly guides our framework to find interpretable biomarkers that are substantially different between a healthy and disease population. We exploit the interconnectedness of different brain regions by incorporating a graph regularization penalty into the joint objective function. We also use a group sparsity penalty to find a representative set of genetic basis vectors that span a low dimensional space where subjects are easily separable between patients and controls. We have evaluated our model on a population study of schizophrenia that includes two task fMRI paradigms and single nucleotide polymorphism (SNP) data. Using ten-fold cross validation, we compare our generative-discriminative framework with canonical correlation analysis (CCA) of imaging and genetics data, parallel independent component analysis (pICA) of imaging and genetics data, random forest (RF) classification, and a linear support vector machine (SVM). We also quantify the reproducibility of the imaging and genetics biomarkers via subsampling. Our framework achieves higher class prediction accuracy and identifies robust biomarkers. Moreover, the implicated brain regions and genetic variants underlie the well documented deficits in schizophrenia.


Assuntos
Encéfalo/diagnóstico por imagem , Esquizofrenia/diagnóstico , Adulto , Feminino , Marcadores Genéticos , Humanos , Imageamento por Ressonância Magnética , Masculino , Reprodutibilidade dos Testes , Esquizofrenia/diagnóstico por imagem , Esquizofrenia/genética
5.
BMC Bioinformatics ; 20(Suppl 19): 657, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31870274

RESUMO

BACKGROUND: Synthetic lethality has attracted a lot of attentions in cancer therapeutics due to its utility in identifying new anticancer drug targets. Identifying synthetic lethal (SL) interactions is the key step towards the exploration of synthetic lethality in cancer treatment. However, biological experiments are faced with many challenges when identifying synthetic lethal interactions. Thus, it is necessary to develop computational methods which could serve as useful complements to biological experiments. RESULTS: In this paper, we propose a novel graph regularized self-representative matrix factorization (GRSMF) algorithm for synthetic lethal interaction prediction. GRSMF first learns the self-representations from the known SL interactions and further integrates the functional similarities among genes derived from Gene Ontology (GO). It can then effectively predict potential SL interactions by leveraging the information provided by known SL interactions and functional annotations of genes. Extensive experiments on the synthetic lethal interaction data downloaded from SynLethDB database demonstrate the superiority of our GRSMF in predicting potential synthetic lethal interactions, compared with other competing methods. Moreover, case studies of novel interactions are conducted in this paper for further evaluating the effectiveness of GRSMF in synthetic lethal interaction prediction. CONCLUSIONS: In this paper, we demonstrate that by adaptively exploiting the self-representation of original SL interaction data, and utilizing functional similarities among genes to enhance the learning of self-representation matrix, our GRSMF could predict potential SL interactions more accurately than other state-of-the-art SL interaction prediction methods.


Assuntos
Neoplasias/genética , Algoritmos , Antineoplásicos/uso terapêutico , Humanos , Neoplasias/tratamento farmacológico
6.
BMC Bioinformatics ; 20(Suppl 8): 287, 2019 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-31182006

RESUMO

BACKGROUND: Predicting drug-target interactions is time-consuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drug-target networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drug-target interactions to some extent, there is little effect for some new drugs or targets that have no known interaction. RESULTS: Since the datasets are usually located at or near low-dimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrix-decomposition method. In addition, we use one of the pre-processing steps previously proposed to improve the accuracy of the prediction. CONCLUSIONS: Cross-validation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drug-target interactions.


Assuntos
Algoritmos , Interações Medicamentosas , Bases de Dados como Assunto , Humanos , Reprodutibilidade dos Testes
7.
BMC Bioinformatics ; 20(Suppl 22): 718, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888442

RESUMO

BACKGROUND: Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed. RESULTS: In this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed. CONCLUSIONS: A series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.


Assuntos
Algoritmos , Neoplasias/genética , Análise por Conglomerados , Mineração de Dados , Bases de Dados Genéticas , Humanos , Redução Dimensional com Múltiplos Fatores , Oncogenes
8.
BMC Genomics ; 20(Suppl 11): 944, 2019 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-31856727

RESUMO

BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the "big p, small n" problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the "big p, small n" problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.


Assuntos
Algoritmos , Genômica , Modelos Biológicos , Biologia de Sistemas/métodos , Viés , Mineração de Dados , Bases de Dados Genéticas , Humanos , Bases de Conhecimento , Aprendizado de Máquina , Neoplasias/genética , Neoplasias/mortalidade , Neoplasias/patologia
9.
J Biomed Inform ; 80: 26-36, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29481877

RESUMO

The emergence of network medicine has provided great insight into the identification of disease-related molecules, which could help with the development of personalized medicine. However, the state-of-the-art methods could neither simultaneously consider target information and the known miRNA-disease associations nor effectively explore novel gene-disease associations as a by-product during the process of inferring disease-related miRNAs. Computational methods incorporating multiple sources of information offer more opportunities to infer disease-related molecules, including miRNAs and genes in heterogeneous networks at a system level. In this study, we developed a novel algorithm, named inference of Disease-related MiRNAs based on Heterogeneous Manifold (DMHM), to accurately and efficiently identify miRNA-disease associations by integrating multi-omics data. Graph-based regularization was utilized to obtain a smooth function on the data manifold, which constitutes the main principle of DMHM. The novelty of this framework lies in the relatedness between diseases and miRNAs, which are measured via heterogeneous manifolds on heterogeneous networks integrating target information. To demonstrate the effectiveness of DMHM, we conducted comprehensive experiments based on HMDD datasets and compared DMHM with six state-of-the-art methods. Experimental results indicated that DMHM significantly outperformed the other six methods under fivefold cross validation and de novo prediction tests. Case studies have further confirmed the practical usefulness of DMHM.


Assuntos
Biologia Computacional/métodos , Estudos de Associação Genética/métodos , MicroRNAs/genética , Neoplasias/genética , Algoritmos , Bases de Dados Genéticas , Humanos , MicroRNAs/análise , MicroRNAs/metabolismo , Neoplasias/classificação , Neoplasias/metabolismo , Reprodutibilidade dos Testes
10.
Neural Netw ; 179: 106531, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39029296

RESUMO

As an effective strategy for reducing the noisy and redundant information for hyperspectral imagery (HSI), hyperspectral band selection intends to select a subset of original hyperspectral bands, which boosts the subsequent different tasks. In this paper, we introduce a multi-dimensional high-order structure preserved clustering method for hyperspectral band selection, referred to as MHSPC briefly. By regarding original hyperspectral images as a tensor cube, we apply the tensor CP (CANDECOMP/PARAFAC) decomposition on it to exploit the multi-dimensional structural information as well as generate a low-dimensional latent feature representation. In order to capture the local geometrical structure along the spectral dimension, a graph regularizer is imposed on the new feature representation in the lower dimensional space. In addition, since the low rankness of HSIs is an important global property, we utilize a nuclear norm constraint on the latent feature representation matrix to capture the global data structure information. Different to most of previous clustering based hyperspectral band selection methods which vectorize each band as a vector without considering the 2-D spatial information, the proposed MHSPC can effectively capture the spatial structure as well as the spectral correlation of original hyperspectral cube in both local and global perspectives. An efficient alternatively updating algorithm with theoretical convergence guarantee is designed to solve the resultant optimization problem, and extensive experimental results on four benchmark datasets validate the effectiveness of the proposed MHSPC over other state-of-the-arts.


Assuntos
Algoritmos , Análise por Conglomerados , Imageamento Hiperespectral/métodos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
11.
Artigo em Inglês | MEDLINE | ID: mdl-36912759

RESUMO

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Componente Principal , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
12.
J Appl Stat ; 50(6): 1400-1417, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37025276

RESUMO

Traditional regression methods typically consider only covariate information and assume that the observations are mutually independent samples. However, samples usually come from individuals connected by a network in many modern applications. We present a risk minimization formulation for learning from both covariates and network structure in the context of graph kernel regularization. The formulation involves a loss function with a penalty term. This penalty can be used not only to encourage similarity between linked nodes but also lead to improvement over traditional regression models. Furthermore, the penalty can be used with many loss-based predictive methods, such as linear regression with squared loss and logistic regression with log-likelihood loss. Simulations to evaluate the performance of this model in the cases of low dimensions and high dimensions show that our proposed approach outperforms all other benchmarks. We verify this for uniform graph, nonuniform graph, balanced-sample, and unbalanced-sample datasets. The approach was applied to predicting the response values on a 'follow' social network of Tencent Weibo users and on two citation networks (Cora and CiteSeer). Each instance verifies that the proposed method combining covariate information and link structure with the graph kernel regularization can improve predictive performance.

13.
J Comput Biol ; 30(8): 848-860, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37471220

RESUMO

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.


Assuntos
Algoritmos , RNA , Análise por Conglomerados , Análise de Célula Única/métodos , Análise de Sequência de RNA , Perfilação da Expressão Gênica
14.
Math Biosci Eng ; 20(7): 12486-12509, 2023 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-37501452

RESUMO

Non-negative matrix factorization (NMF) has been widely used in machine learning and data mining fields. As an extension of NMF, non-negative matrix tri-factorization (NMTF) provides more degrees of freedom than NMF. However, standard NMTF algorithm utilizes Frobenius norm to calculate residual error, which can be dramatically affected by noise and outliers. Moreover, the hidden geometric information in feature manifold and sample manifold is rarely learned. Hence, a novel robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization (RCHNMTF) is proposed. First, a robust capped norm is adopted to handle extreme outliers. Second, dual hyper-graph regularization is considered to exploit intrinsic geometric information in feature manifold and sample manifold. Third, orthogonality constraints are added to learn unique data presentation and improve clustering performance. The experiments on seven datasets testify the robustness and superiority of RCHNMTF.

15.
Neural Netw ; 158: 188-196, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36462365

RESUMO

In recent years, semi-supervised learning on graphs has gained importance in many fields and applications. The goal is to use both partially labeled data (labeled examples) and a large amount of unlabeled data to build more effective predictive models. Deep Graph Neural Networks (GNNs) are very useful in both unsupervised and semi-supervised learning problems. As a special class of GNNs, Graph Convolutional Networks (GCNs) aim to obtain data representation through graph-based node smoothing and layer-wise neural network transformations. However, GCNs have some weaknesses when applied to semi-supervised graph learning: (1) it ignores the manifold structure implicitly encoded by the graph; (2) it uses a fixed neighborhood graph and focuses only on the convolution of a graph, but pays little attention to graph construction; (3) it rarely considers the problem of topological imbalance. To overcome the above shortcomings, in this paper, we propose a novel semi-supervised learning method called Re-weight Nodes and Graph Learning Convolutional Network with Manifold Regularization (ReNode-GLCNMR). Our proposed method simultaneously integrates graph learning and graph convolution into a unified network architecture, which also enforces label smoothing through an unsupervised loss term. At the same time, it addresses the problem of imbalance in graph topology by adaptively reweighting the influence of labeled nodes based on their distances to the class boundaries. Experiments on 8 benchmark datasets show that ReNode-GLCNMR significantly outperforms the state-of-the-art semi-supervised GNN methods.1.


Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado
16.
Front Genet ; 14: 1179439, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37359367

RESUMO

Introduction: The development of multimodal single-cell omics methods has enabled the collection of data across different omics modalities from the same set of single cells. Each omics modality provides unique information about cell type and function, so the ability to integrate data from different modalities can provide deeper insights into cellular functions. Often, single-cell omics data can prove challenging to model because of high dimensionality, sparsity, and technical noise. Methods: We propose a novel multimodal data analysis method called joint graph-regularized Single-Cell Kullback-Leibler Sparse Non-negative Matrix Factorization (jrSiCKLSNMF, pronounced "junior sickles NMF") that extracts latent factors shared across omics modalities within the same set of single cells. Results: We compare our clustering algorithm to several existing methods on four sets of data simulated from third party software. We also apply our algorithm to a real set of cell line data. Discussion: We show overwhelmingly better clustering performance than several existing methods on the simulated data. On a real multimodal omics dataset, we also find our method to produce scientifically accurate clustering results.

17.
Comput Biol Chem ; 104: 107862, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37031647

RESUMO

Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.


Assuntos
Algoritmos , Análise por Conglomerados
18.
J Comput Biol ; 29(5): 441-452, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35394368

RESUMO

This study formulates antiviral repositioning as a matrix completion problem wherein the antiviral drugs are along the rows and the viruses are along the columns. The input matrix is partially filled, with ones in positions where the antiviral drug has been known to be effective against a virus. The curated metadata for antivirals (chemical structure and pathways) and viruses (genomic structure and symptoms) are encoded into our matrix completion framework as graph Laplacian regularization. We then frame the resulting multiple graph regularized matrix completion (GRMC) problem as deep matrix factorization. This is solved by using a novel optimization method called HyPALM (Hybrid Proximal Alternating Linearized Minimization). Results of our curated RNA drug-virus association data set show that the proposed approach excels over state-of-the-art GRMC techniques. When applied to in silico prediction of antivirals for COVID-19, our approach returns antivirals that are either used for treating patients or are under trials for the same.


Assuntos
Tratamento Farmacológico da COVID-19 , Algoritmos , Antivirais/farmacologia , Antivirais/uso terapêutico , Humanos
19.
Front Microbiol ; 12: 650366, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33868209

RESUMO

Metabolites are closely related to human disease. The interaction between metabolites and drugs has drawn increasing attention in the field of pharmacomicrobiomics. However, only a small portion of the drug-metabolite interactions were experimentally observed due to the fact that experimental validation is labor-intensive, costly, and time-consuming. Although a few computational approaches have been proposed to predict latent associations for various bipartite networks, such as miRNA-disease, drug-target interaction networks, and so on, to our best knowledge the associations between drugs and metabolites have not been reported on a large scale. In this study, we propose a novel algorithm, namely inductive logistic matrix factorization (ILMF) to predict the latent associations between drugs and metabolites. Specifically, the proposed ILMF integrates drug-drug interaction, metabolite-metabolite interaction, and drug-metabolite interaction into this framework, to model the probability that a drug would interact with a metabolite. Moreover, we exploit inductive matrix completion to guide the learning of projection matrices U and V that depend on the low-dimensional feature representation matrices of drugs and metabolites: Fm and Fd . These two matrices can be obtained by fusing multiple data sources. Thus, Fd U and Fm V can be viewed as drug-specific and metabolite-specific latent representations, different from classical LMF. Furthermore, we utilize the Vicus spectral matrix that reveals the refined local geometrical structure inherent in the original data to encode the relationships between drugs and metabolites. Extensive experiments are conducted on a manually curated "DrugMetaboliteAtlas" dataset. The experimental results show that ILMF can achieve competitive performance compared with other state-of-the-art approaches, which demonstrates its effectiveness in predicting potential drug-metabolite associations.

20.
Front Genet ; 12: 621317, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33708239

RESUMO

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L2,1-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < p < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L2,1-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA