Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36617209

RESUMO

Recent studies have shown that the expression of circRNAs would affect drug sensitivity of cells and thus significantly influence the efficacy of drugs. Traditional biomedical experiments to validate such relationships are time-consuming and costly. Therefore, developing effective computational methods to predict potential associations between circRNAs and drug sensitivity is an important and urgent task. In this study, we propose a novel method, called MNGACDA, to predict possible circRNA-drug sensitivity associations for further biomedical screening. First, MNGACDA uses multiple sources of information from circRNAs and drugs to construct multimodal networks. It then employs node-level attention graph auto-encoders to obtain low-dimensional embeddings for circRNAs and drugs from the multimodal networks. Finally, an inner product decoder is applied to predict the association scores between circRNAs and drug sensitivity based on the embedding representations of circRNAs and drugs. Extensive experimental results based on cross-validations show that MNGACDA outperforms six other state-of-the-art methods. Furthermore, excellent performance in case studies demonstrates that MNGACDA is an effective tool for predicting circRNA-drug sensitivity associations in real situations. These results confirm the reliable prediction ability of MNGACDA in revealing circRNA-drug sensitivity associations.


Assuntos
RNA Circular , RNA Circular/genética
2.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35108355

RESUMO

MOTIVATION: Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction. RESULTS: In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE. CONTACT: huanglan@jlu.edu.cn, wy6868@jlu.edu.cn.


Assuntos
MicroRNAs , RNA Longo não Codificante , Algoritmos , Biologia Computacional/métodos , Humanos , MicroRNAs/genética , Redes Neurais de Computação , RNA Longo não Codificante/genética
3.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36305456

RESUMO

Long non-coding RNAs (lncRNAs) can disrupt the biological functions of protein-coding genes (PCGs) to cause cancer. However, the relationship between lncRNAs and PCGs remains unclear and difficult to predict. Machine learning has achieved a satisfactory performance in association prediction, but to our knowledge, it is currently less used in lncRNA-PCG association prediction. Therefore, we introduce GAE-LGA, a powerful deep learning model with graph autoencoders as components, to recognize potential lncRNA-PCG associations. GAE-LGA jointly explored lncRNA-PCG learning and cross-omics correlation learning for effective lncRNA-PCG association identification. The functional similarity and multi-omics similarity of lncRNAs and PCGs were accumulated and encoded by graph autoencoders to extract feature representations of lncRNAs and PCGs, which were subsequently used for decoding to obtain candidate lncRNA-PCG pairs. Comprehensive evaluation demonstrated that GAE-LGA can successfully capture lncRNA-PCG associations with strong robustness and outperformed other machine learning-based identification methods. Furthermore, multi-omics features were shown to improve the performance of lncRNA-PCG association identification. In conclusion, GAE-LGA can act as an efficient application for lncRNA-PCG association prediction with the following advantages: It fuses multi-omics information into the similarity network, making the feature representation more accurate; it can predict lncRNA-PCG associations for new lncRNAs and identify potential lncRNA-PCG associations with high accuracy.


Assuntos
Neoplasias , RNA Longo não Codificante , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Neoplasias/genética , RNA Longo não Codificante/genética , Proteínas/genética
4.
Methods ; 211: 48-60, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36804214

RESUMO

Single-cell RNA sequencing (scRNA-seq) data scale surges with high-throughput sequencing technology development. However, although single-cell data analysis is a powerful tool, various issues have been reported, such as sequencing sparsity and complex differential patterns in gene expression. Statistical or traditional machine learning methods are inefficient, and the accuracy needs to be improved. The methods based on deep learning can not directly process non-Euclidean spatial data, such as cell diagrams. In this study, we have developed graph autoencoders and graph attention network for scRNA-seq analysis based on a directed graph neural network named scDGAE. Directed graph neural networks cannot only retain the connection properties of the directed graph but also expand the receptive field of the convolution operation. Cosine similarity, median L1 distance, and root-mean-squared error are used to measure the gene imputation performance of different methods with scDGAE. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score are used to measure the cell clustering performance of different methods with scDGAE. Experiment results show that the scDGAE model achieves promising performance in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels. Furthermore, it is a robust framework that can be applied to general scRNA-Seq analyses.


Assuntos
Redes Neurais de Computação , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Análise de Dados , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
5.
Int J Mol Sci ; 24(10)2023 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-37240128

RESUMO

The prediction of a ligand potency to inhibit SARS-CoV-2 main protease (M-pro) would be a highly helpful addition to a virtual screening process. The most potent compounds might then be the focus of further efforts to experimentally validate their potency and improve them. A computational method to predict drug potency, which is based on three main steps, is defined: (1) defining the drug and protein in only one 3D structure; (2) applying graph autoencoder techniques with the aim of generating a latent vector; and (3) using a classical fitting model to the latent vector to predict the potency of the drug. Experiments in a database of 160 drug-M-pro pairs, from which the pIC50 is known, show the ability of our method to predict their drug potency with high accuracy. Moreover, the time spent to compute the pIC50 of the whole database is only some seconds, using a current personal computer. Thus, it can be concluded that a computational tool that predicts, with high reliability, the pIC50 in a cheap and fast way is achieved. This tool, which can be used to prioritize which virtual screening hits, will be further examined in vitro.


Assuntos
COVID-19 , Humanos , SARS-CoV-2/metabolismo , Simulação de Acoplamento Molecular , Reprodutibilidade dos Testes , Inibidores de Proteases/química , Antivirais/farmacologia , Antivirais/química
6.
Sci Rep ; 14(1): 14368, 2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-38909046

RESUMO

As urban development accelerates and natural disasters occur more frequently, the urgency of developing effective emergency shelter planning strategies intensifies. The shelter location selection method under the traditional multi-criteria decision-making framework suffers from issues such as strong subjectivity and insufficient data support. Artificial intelligence offers a robust data-driven approach for site selection; however, many methods neglect the spatial relationships of site selection targets within geographical space. This paper introduces an emergency shelter site selection model that combines a variational graph autoencoder (VGAE) with a random forest (RF), namely VGAE-RF. In the constructed urban spatial topological graph, based on network geographic information, this model captures both the latent features of geographic unit coupling and integrates explicit and latent features to forecast the likelihood of emergency shelters in the construction area. This study takes Beijing, China, as the experimental area and evaluates the reliability of different model methods using a confusion matrix, Receiver Operating Characteristic (ROC) curve, and Imbalance Index of spatial distribution as evaluation indicators. The experimental results indicate that the proposed VGAE-RF model method, which considers spatial semantic associations, displays the best reliability.

7.
PeerJ Comput Sci ; 9: e1335, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346640

RESUMO

Social networking has become a hot topic, in which recommendation algorithms are the most important. Recently, the combination of deep learning and recommendation algorithms has attracted considerable attention. The integration of autoencoders and graph convolutional neural networks, while providing an effective solution to the shortcomings of traditional algorithms, fails to take into account user preferences and risks over-smoothing as the number of encoder layers increases. Therefore, we introduce L1 and L2 regularization techniques and fuse them linearly to address user preferences and over-smoothing. In addition, the presence of a large amount of noisy data in the graph data has an impact on feature extraction. To our best knowledge, most existing models do not account for noise and address the problem of noisy data in graph data. Thus, we introduce the idea of denoising autoencoders into graph autoencoders, which can effectively address the noise problem. We demonstrate the capability of the proposed model on four widely used datasets and experimentally demonstrate that our model is more competitive by improving up to 1.3, 1.4, and 1.2, respectively, on the edge prediction task.

8.
Neural Netw ; 165: 491-505, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37336034

RESUMO

MicroRNAs (miRNA) play critical roles in diverse biological processes of diseases. Inferring potential disease-miRNA associations enable us to better understand the development and diagnosis of complex human diseases via computational algorithms. The work presents a variational gated autoencoder-based feature extraction model to extract complex contextual features for inferring potential disease-miRNA associations. Specifically, our model fuses three different similarities of miRNAs into a comprehensive miRNA network and then combines two various similarities of diseases into a comprehensive disease network, respectively. Then, a novel graph autoencoder is designed to extract multilevel representations based on variational gate mechanisms from heterogeneous networks of miRNAs and diseases. Finally, a gate-based association predictor is devised to combine multiscale representations of miRNAs and diseases via a novel contrastive cross-entropy function, and then infer disease-miRNA associations. Experimental results indicate that our proposed model achieves remarkable association prediction performance, proving the efficacy of the variational gate mechanism and contrastive cross-entropy loss for inferring disease-miRNA associations.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , Predisposição Genética para Doença , Algoritmos , Biologia Computacional/métodos
9.
Comput Struct Biotechnol J ; 21: 4759-4768, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37822562

RESUMO

Topologically associated domains (TADs) play a pivotal role in disease detection. This study introduces a novel TADs recognition approach named TOAST, leveraging graph auto-encoders and clustering techniques. TOAST conceptualizes each genomic bin as a node of a graph and employs the Hi-C contact matrix as the graph's adjacency matrix. By employing graph auto-encoders, TOAST generates informative embeddings as features. Subsequently, the unsupervised clustering algorithm HDBSCAN is utilized to assign labels to each genomic bin, facilitating the identification of contiguous regions with the same label as TADs. Our experimental analysis of several simulated Hi-C data sets shows that TOAST can quickly and accurately identify TADs from different types of simulated Hi-C contact matrices, outperforming existing algorithms. We also determined the anchoring ratio of TAD boundaries by analyzing different TAD recognition algorithms, and obtained an average ratio of anchoring CTCF, SMC3, RAD21, POLR2A, H3K36me3, H3K9me3, H3K4me3, H3K4me1, Enhancer, and Promoters of 0.66, 0.47, 0.54, 0.27, 0.24, 0.12, 0.32, 0.41, 0.26, and 0.13, respectively. In conclusion, TOAST is a method that can quickly identify TAD boundary parameters that are easy to understand and have important biological significance. The TOAST web server can be accessed via http://223.223.185.189:4005/. The code of TOAST is available online at https://github.com/ghaiyan/TOAST.

10.
Neural Netw ; 163: 156-164, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37054514

RESUMO

Existing graph contrastive learning methods rely on augmentation techniques based on random perturbations (e.g., randomly adding or dropping edges and nodes). Nevertheless, altering certain edges or nodes can unexpectedly change the graph characteristics, and choosing the optimal perturbing ratio for each dataset requires onerous manual tuning. In this paper, we introduce Implicit Graph Contrastive Learning (iGCL), which utilizes augmentations in the latent space learned from a Variational Graph Auto-Encoder by reconstructing graph topological structure. Importantly, instead of explicitly sampling augmentations from latent distributions, we further propose an upper bound for the expected contrastive loss to improve the efficiency of our learning algorithm. Thus, graph semantics can be preserved within the augmentations in an intelligent way without arbitrary manual design or prior human knowledge. Experimental results on both graph-level and node-level show that the proposed method achieves state-of-the-art accuracy on downstream classification tasks compared to other graph contrastive baselines, where ablation studies in the end demonstrate the effectiveness of modules in iGCL.


Assuntos
Algoritmos , Inteligência , Humanos , Conhecimento , Semântica
11.
Front Pharmacol ; 13: 1056605, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36618933

RESUMO

Predicting new therapeutic effects (drug repositioning) of existing drugs plays an important role in drug development. However, traditional wet experimental prediction methods are usually time-consuming and costly. The emergence of more and more artificial intelligence-based drug repositioning methods in the past 2 years has facilitated drug development. In this study we propose a drug repositioning method, VGAEDR, based on a heterogeneous network of multiple drug attributes and a variational graph autoencoder. First, a drug-disease heterogeneous network is established based on three drug attributes, disease semantic information, and known drug-disease associations. Second, low-dimensional feature representations for heterogeneous networks are learned through a variational graph autoencoder module and a multi-layer convolutional module. Finally, the feature representation is fed to a fully connected layer and a Softmax layer to predict new drug-disease associations. Comparative experiments with other baseline methods on three datasets demonstrate the excellent performance of VGAEDR. In the case study, we predicted the top 10 possible anti-COVID-19 drugs on the existing drug and disease data, and six of them were verified by other literatures.

12.
Front Genet ; 13: 1003711, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36568390

RESUMO

With the development of high-throughput sequencing technology, the scale of single-cell RNA sequencing (scRNA-seq) data has surged. Its data are typically high-dimensional, with high dropout noise and high sparsity. Therefore, gene imputation and cell clustering analysis of scRNA-seq data is increasingly important. Statistical or traditional machine learning methods are inefficient, and improved accuracy is needed. The methods based on deep learning cannot directly process non-Euclidean spatial data, such as cell diagrams. In this study, we developed scGAEGAT, a multi-modal model with graph autoencoders and graph attention networks for scRNA-seq analysis based on graph neural networks. Cosine similarity, median L1 distance, and root-mean-squared error were used to measure the gene imputation performance of different methods for comparison with scGAEGAT. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score were used to measure the cell clustering performance of different methods for comparison with scGAEGAT. Experimental results demonstrated promising performance of the scGAEGAT model in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels.

13.
Neural Netw ; 153: 474-495, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35816860

RESUMO

Graph autoencoders (GAE) and variational graph autoencoders (VGAE) emerged as powerful methods for link prediction. Their performances are less impressive on community detection problems where, according to recent and concurring experimental evaluations, they are often outperformed by simpler alternatives such as the Louvain method. It is currently still unclear to which extent one can improve community detection with GAE and VGAE, especially in the absence of node features. It is moreover uncertain whether one could do so while simultaneously preserving good performances on link prediction. In this paper, we show that jointly addressing these two tasks with high accuracy is possible. For this purpose, we introduce and theoretically study a community-preserving message passing scheme, doping our GAE and VGAE encoders by considering both the initial graph structure and modularity-based prior communities when computing embedding spaces. We also propose novel training and optimization strategies, including the introduction of a modularity-inspired regularizer complementing the existing reconstruction losses for joint link prediction and community detection. We demonstrate the empirical effectiveness of our approach, referred to as Modularity-Aware GAE and VGAE, through in-depth experimental validation on various real-world graphs.


Assuntos
Redes Neurais de Computação
14.
Genes (Basel) ; 13(6)2022 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-35741810

RESUMO

Most publicly accessible single-cell Hi-C data are sparse and cannot reach a higher resolution. Therefore, learning latent representations (bin-specific embeddings) of sparse single-cell Hi-C matrices would provide us with a novel way of mining valuable information hidden in the limited number of single-cell Hi-C contacts. We present scHiCEmbed, an unsupervised computational method for learning bin-specific embeddings of single-cell Hi-C data, and the computational system is applied to the tasks of 3D structure reconstruction of whole genomes and detection of topologically associating domains (TAD). The only input of scHiCEmbed is a raw or scHiCluster-imputed single-cell Hi-C matrix. The main process of scHiCEmbed is to embed each node/bin in a higher dimensional space using graph auto-encoders. The learned n-by-3 bin-specific embedding/latent matrix is considered the final reconstructed 3D genome structure. For TAD detection, we use constrained hierarchical clustering on the latent matrix to classify bins: S_Dbw is used to determine the optimal number of clusters, and each cluster is considered as one potential TAD. Our reconstructed 3D structures for individual chromatins at different cell stages reveal the expanding process of chromatins during the cell cycle. We observe that the TADs called from single-cell Hi-C data are not shared across individual cells and that the TAD boundaries called from raw or imputed single-cell Hi-C are significantly different from those called from bulk Hi-C, confirming the cell-to-cell variability in terms of TAD definitions. The source code for scHiCEmbed is publicly available, and the URL can be found in the conclusion section.


Assuntos
Cromatina , Genoma , Análise por Conglomerados , Software
15.
Neural Netw ; 142: 1-19, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33962132

RESUMO

Graph autoencoders (AE) and variational autoencoders (VAE) are powerful node embedding methods, but suffer from scalability issues. In this paper, we introduce FastGAE, a general framework to scale graph AE and VAE to large graphs with millions of nodes and edges. Our strategy, based on an effective stochastic subgraph decoding scheme, significantly speeds up the training of graph AE and VAE while preserving or even improving performances. We demonstrate the effectiveness of FastGAE on various real-world graphs, outperforming the few existing approaches to scale graph AE and VAE by a wide margin.


Assuntos
Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA