Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
1.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37742053

RESUMEN

Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)-based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage-bacteria interaction (PBI) and six bacteria-bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.


Asunto(s)
Bacteriófagos , Infecciones Estafilocócicas , Humanos , Aprendizaje , Bacterias , Redes Neurales de la Computación
2.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38084923

RESUMEN

The stability of the gut microenvironment is inextricably linked to human health, with the onset of many diseases accompanied by dysbiosis of the gut microbiota. It has been reported that there are differences in the microbial community composition between patients and healthy individuals, and many microbes are considered potential biomarkers. Accurately identifying these biomarkers can lead to more precise and reliable clinical decision-making. To improve the accuracy of microbial biomarker identification, this study introduces WSGMB, a computational framework that uses the relative abundance of microbial taxa and health status as inputs. This method has two main contributions: (1) viewing the microbial co-occurrence network as a weighted signed graph and applying graph convolutional neural network techniques for graph classification; (2) designing a new architecture to compute the role transitions of each microbial taxon between health and disease networks, thereby identifying disease-related microbial biomarkers. The weighted signed graph neural network enhances the quality of graph embeddings; quantifying the importance of microbes in different co-occurrence networks better identifies those microbes critical to health. Microbes are ranked according to their importance change scores, and when this score exceeds a set threshold, the microbe is considered a biomarker. This framework's identification performance is validated by comparing the biomarkers identified by WSGMB with actual microbial biomarkers associated with specific diseases from public literature databases. The study tests the proposed computational framework using actual microbial community data from colorectal cancer and Crohn's disease samples. It compares it with the most advanced microbial biomarker identification methods. The results show that the WSGMB method outperforms similar approaches in the accuracy of microbial biomarker identification.


Asunto(s)
Enfermedad de Crohn , Microbioma Gastrointestinal , Microbiota , Humanos , Redes Neurales de la Computación , Biomarcadores
3.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36642408

RESUMEN

Current machine learning-based methods have achieved inspiring predictions in the scenarios of mono-type and multi-type drug-drug interactions (DDIs), but they all ignore enhancive and depressive pharmacological changes triggered by DDIs. In addition, these pharmacological changes are asymmetric since the roles of two drugs in an interaction are different. More importantly, these pharmacological changes imply significant topological patterns among DDIs. To address the above issues, we first leverage Balance theory and Status theory in social networks to reveal the topological patterns among directed pharmacological DDIs, which are modeled as a signed and directed network. Then, we design a novel graph representation learning model named SGRL-DDI (social theory-enhanced graph representation learning for DDI) to realize the multitask prediction of DDIs. SGRL-DDI model can capture the task-joint information by integrating relation graph convolutional networks with Balance and Status patterns. Moreover, we utilize task-specific deep neural networks to perform two tasks, including the prediction of enhancive/depressive DDIs and the prediction of directed DDIs. Based on DDI entries collected from DrugBank, the superiority of our model is demonstrated by the comparison with other state-of-the-art methods. Furthermore, the ablation study verifies that Balance and Status patterns help characterize directed pharmacological DDIs, and that the joint of two tasks provides better DDI representations than individual tasks. Last, we demonstrate the practical effectiveness of our model by a version-dependent test, where 88.47 and 81.38% DDI out of newly added entries provided by the latest release of DrugBank are validated in two predicting tasks respectively.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Interacciones Farmacológicas
4.
BMC Biol ; 22(1): 156, 2024 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-39020316

RESUMEN

BACKGROUND: Identification of potential drug-target interactions (DTIs) with high accuracy is a key step in drug discovery and repositioning, especially concerning specific drug targets. Traditional experimental methods for identifying the DTIs are arduous, time-intensive, and financially burdensome. In addition, robust computational methods have been developed for predicting the DTIs and are widely applied in drug discovery research. However, advancing more precise algorithms for predicting DTIs is essential to meet the stringent standards demanded by drug discovery. RESULTS: We proposed a novel method called GSRF-DTI, which integrates networks with a deep learning algorithm to identify DTIs. Firstly, GSRF-DTI learned the embedding representation of drugs and targets by integrating multiple drug association information and target association information, respectively. Then, GSRF-DTI considered the influence of drug-target pair (DTP) association on DTI prediction to construct a drug-target pair network (DTP-NET). Next, we utilized GraphSAGE on DTP-NET to learn the potential features of the network and applied random forest (RF) to predict the DTIs. Furthermore, we conducted ablation experiments to validate the necessity of integrating different types of network features for identifying DTIs. It is worth noting that GSRF-DTI proposed three novel DTIs. CONCLUSIONS: GSRF-DTI not only considered the influence of the interaction relationship between drug and target but also considered the impact of DTP association relationship on DTI prediction. We initially use GraphSAGE to aggregate the neighbor information of nodes for better identification. Experimental analysis on Luo's dataset and the newly constructed dataset revealed that the GSRF-DTI framework outperformed several state-of-the-art methods significantly.


Asunto(s)
Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Aprendizaje Profundo , Biología Computacional/métodos , Algoritmos , Preparaciones Farmacéuticas
5.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34891172

RESUMEN

Identifying new indications for drugs plays an essential role at many phases of drug research and development. Computational methods are regarded as an effective way to associate drugs with new indications. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering the biological knowledge of drugs and diseases, which are believed to be useful for improving the accuracy of drug repositioning. To this end, a novel heterogeneous information network (HIN) based model, namely HINGRL, is proposed to precisely identify new indications for drugs based on graph representation learning techniques. More specifically, HINGRL first constructs a HIN by integrating drug-disease, drug-protein and protein-disease biological networks with the biological knowledge of drugs and diseases. Then, different representation strategies are applied to learn the features of nodes in the HIN from the topological and biological perspectives. Finally, HINGRL adopts a Random Forest classifier to predict unknown drug-disease associations based on the integrated features of drugs and diseases obtained in the previous step. Experimental results demonstrate that HINGRL achieves the best performance on two real datasets when compared with state-of-the-art models. Besides, our case studies indicate that the simultaneous consideration of network topology and biological knowledge of drugs and diseases allows HINGRL to precisely predict drug-disease associations from a more comprehensive perspective. The promising performance of HINGRL also reveals that the utilization of rich heterogeneous information provides an alternative view for HINGRL to identify novel drug-disease associations especially for new diseases.


Asunto(s)
Servicios de Información , Aprendizaje Automático , Preparaciones Farmacéuticas , Algoritmos , Biología Computacional/métodos , Enfermedad , Reposicionamiento de Medicamentos/métodos , Humanos , Modelos Teóricos , Redes Neurales de la Computación
6.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34471921

RESUMEN

Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Algoritmos , Biología Computacional/métodos , Conocimiento , Aprendizaje Automático
7.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35062020

RESUMEN

Accurate prediction of atomic partial charges with high-level quantum mechanics (QM) methods suffers from high computational cost. Numerous feature-engineered machine learning (ML)-based predictors with favorable computability and reliability have been developed as alternatives. However, extensive expertise effort was needed for feature engineering of atom chemical environment, which may consequently introduce domain bias. In this study, SuperAtomicCharge, a data-driven deep graph learning framework, was proposed to predict three important types of partial charges (i.e. RESP, DDEC4 and DDEC78) derived from high-level QM calculations based on the structures of molecules. SuperAtomicCharge was designed to simultaneously exploit the 2D and 3D structural information of molecules, which was proved to be an effective way to improve the prediction accuracy of the model. Moreover, a simple transfer learning strategy and a multitask learning strategy based on self-supervised descriptors were also employed to further improve the prediction accuracy of the proposed model. Compared with the latest baselines, including one GNN-based predictor and two ML-based predictors, SuperAtomicCharge showed better performance on all the three external test sets and had better usability and portability. Furthermore, the QM partial charges of new molecules predicted by SuperAtomicCharge can be efficiently used in drug design applications such as structure-based virtual screening, where the predicted RESP and DDEC4 charges of new molecules showed more robust scoring and screening power than the commonly used partial charges. Finally, two tools including an online server (http://cadd.zju.edu.cn/deepchargepredictor) and the source code command lines (https://github.com/zjujdj/SuperAtomicCharge) were developed for the easy access of the SuperAtomicCharge services.


Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , Aprendizaje Automático , Reproducibilidad de los Resultados , Programas Informáticos
8.
Methods ; 211: 31-41, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36792041

RESUMEN

Self-supervised learning has shown superior performance on graph-related tasks in recent years. The most advanced methods are based on contrast learning, which severely limited by structured data augmentation techniques and complex training methods. Generative self-supervised learning, especially graph autoencoders (GAEs), can prevent the above dependence and has been demonstrated as an effective approach. In addition, most previous works only reconstruct the graph topological structure or node features. Few works consider both and combine them together to obtain their complementary information. To overcome these problems, we propose a generative self-supervised graph representation learning methodology named Multi-View Dual-decoder Graph Autoencoder (MDGA). Specifically, we first design a multi-sample graph learning strategy which benefits the generalization of the dual-decoder graph autoencoder. Moreover, the proposed model reconstructs the graph topological structure with a traditional GAE and extracts node attributes by masked feature reconstruction. Experimental results on five public benchmark datasets demonstrate that MDGA outperforms state-of-the-art methods in both node classification and link prediction tasks.


Asunto(s)
Benchmarking , Galactosamina , Compuestos de Dansilo
9.
J Biomed Inform ; 151: 104616, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38423267

RESUMEN

OBJECTIVE: This study aims to comprehensively review the use of graph neural networks (GNNs) for clinical risk prediction based on electronic health records (EHRs). The primary goal is to provide an overview of the state-of-the-art of this subject, highlighting ongoing research efforts and identifying existing challenges in developing effective GNNs for improved prediction of clinical risks. METHODS: A search was conducted in the Scopus, PubMed, ACM Digital Library, and Embase databases to identify relevant English-language papers that used GNNs for clinical risk prediction based on EHR data. The study includes original research papers published between January 2009 and May 2023. RESULTS: Following the initial screening process, 50 articles were included in the data collection. A significant increase in publications from 2020 was observed, with most selected papers focusing on diagnosis prediction (n = 36). The study revealed that the graph attention network (GAT) (n = 19) was the most prevalent architecture, and MIMIC-III (n = 23) was the most common data resource. CONCLUSION: GNNs are relevant tools for predicting clinical risk by accounting for the relational aspects among medical events and entities and managing large volumes of EHR data. Future studies in this area may address challenges such as EHR data heterogeneity, multimodality, and model interpretability, aiming to develop more holistic GNN models that can produce more accurate predictions, be effectively implemented in clinical settings, and ultimately improve patient care.


Asunto(s)
Registros Electrónicos de Salud , Lenguaje , Humanos , Recolección de Datos , Bases de Datos Factuales , Redes Neurales de la Computación
10.
Sensors (Basel) ; 24(7)2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38610318

RESUMEN

Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.

11.
BMC Bioinformatics ; 24(1): 162, 2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37085750

RESUMEN

BACKGROUND: The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS: This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS: The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Algoritmos , Conocimiento
12.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33276376

RESUMEN

Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Enfermedad/genética , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo , Humanos
13.
Sensors (Basel) ; 23(8)2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37112507

RESUMEN

Graphs are data structures that effectively represent relational data in the real world. Graph representation learning is a significant task since it could facilitate various downstream tasks, such as node classification, link prediction, etc. Graph representation learning aims to map graph entities to low-dimensional vectors while preserving graph structure and entity relationships. Over the decades, many models have been proposed for graph representation learning. This paper aims to show a comprehensive picture of graph representation learning models, including traditional and state-of-the-art models on various graphs in different geometric spaces. First, we begin with five types of graph embedding models: graph kernels, matrix factorization models, shallow models, deep-learning models, and non-Euclidean models. In addition, we also discuss graph transformer models and Gaussian embedding models. Second, we present practical applications of graph embedding models, from constructing graphs for specific domains to applying models to solve tasks. Finally, we discuss challenges for existing models and future research directions in detail. As a result, this paper provides a structured overview of the diversity of graph embedding models.

14.
Entropy (Basel) ; 25(2)2023 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-36832623

RESUMEN

Understanding the evolutionary patterns of real-world complex systems such as human interactions, biological interactions, transport networks, and computer networks is important for our daily lives. Predicting future links among the nodes in these dynamic networks has many practical implications. This research aims to enhance our understanding of the evolution of networks by formulating and solving the link-prediction problem for temporal networks using graph representation learning as an advanced machine learning approach. Learning useful representations of nodes in these networks provides greater predictive power with less computational complexity and facilitates the use of machine learning methods. Considering that existing models fail to consider the temporal dimensions of the networks, this research proposes a novel temporal network-embedding algorithm for graph representation learning. This algorithm generates low-dimensional features from large, high-dimensional networks to predict temporal patterns in dynamic networks. The proposed algorithm includes a new dynamic node-embedding algorithm that exploits the evolving nature of the networks by considering a simple three-layer graph neural network at each time step and extracting node orientation by using Given's angle method. Our proposed temporal network-embedding algorithm, TempNodeEmb, is validated by comparing it to seven state-of-the-art benchmark network-embedding models. These models are applied to eight dynamic protein-protein interaction networks and three other real-world networks, including dynamic email networks, online college text message networks, and human real contact datasets. To improve our model, we have considered time encoding and proposed another extension to our model, TempNodeEmb++. The results show that our proposed models outperform the state-of-the-art models in most cases based on two evaluation metrics.

15.
Entropy (Basel) ; 25(2)2023 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-36832697

RESUMEN

Topological Data Analysis (TDA) is an approach to analyzing the shape of data using techniques from algebraic topology. The staple of TDA is Persistent Homology (PH). Recent years have seen a trend of combining PH and Graph Neural Networks (GNNs) in an end-to-end manner to capture topological features from graph data. Though effective, these methods are limited by the shortcomings of PH: incomplete topological information and irregular output format. Extended Persistent Homology (EPH), as a variant of PH, addresses these problems elegantly. In this paper, we propose a plug-in topological layer for GNNs, termed Topological Representation with Extended Persistent Homology (TREPH). Taking advantage of the uniformity of EPH, a novel aggregation mechanism is designed to collate topological features of different dimensions to the local positions determining their living processes. The proposed layer is provably differentiable and more expressive than PH-based representations, which in turn is strictly stronger than message-passing GNNs in expressive power. Experiments on real-world graph classification tasks demonstrate the competitiveness of TREPH compared with the state-of-the-art approaches.

16.
Entropy (Basel) ; 25(4)2023 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-37190356

RESUMEN

The graph autoencoder (GAE) is a powerful graph representation learning tool in an unsupervised learning manner for graph data. However, most existing GAE-based methods typically focus on preserving the graph topological structure by reconstructing the adjacency matrix while ignoring the preservation of the attribute information of nodes. Thus, the node attributes cannot be fully learned and the ability of the GAE to learn higher-quality representations is weakened. To address the issue, this paper proposes a novel GAE model that preserves node attribute similarity. The structural graph and the attribute neighbor graph, which is constructed based on the attribute similarity between nodes, are integrated as the encoder input using an effective fusion strategy. In the encoder, the attributes of the nodes can be aggregated both in their structural neighborhood and by their attribute similarity in their attribute neighborhood. This allows performing the fusion of the structural and node attribute information in the node representation by sharing the same encoder. In the decoder module, the adjacency matrix and the attribute similarity matrix of the nodes are reconstructed using dual decoders. The cross-entropy loss of the reconstructed adjacency matrix and the mean-squared error loss of the reconstructed node attribute similarity matrix are used to update the model parameters and ensure that the node representation preserves the original structural and node attribute similarity information. Extensive experiments on three citation networks show that the proposed method outperforms state-of-the-art algorithms in link prediction and node clustering tasks.

17.
BMC Bioinformatics ; 23(Suppl 4): 560, 2022 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-36564705

RESUMEN

BACKGROUND: Anticancer peptide (ACP) inhibits and kills tumor cells. Research on ACP is of great significance for the development of new drugs, and the prediction of ACPs and non-ACPs is the new hotspot. RESULTS: We propose a new machine learning-based method named GCNCPR-ACPs (a Graph Convolutional Neural Network Method based on collapse pooling and residual network to predict the ACPs), which automatically and accurately predicts ACPs using residual graph convolution networks, differentiable graph pooling, and features extracted using peptide sequence information extraction. The GCNCPR-ACPs method can effectively capture different levels of node attributes for amino acid node representation learning, GCNCPR-ACPs uses node2vec and one-hot embedding methods to extract initial amino acid features for ACP prediction. CONCLUSIONS: Experimental results of ten-fold cross-validation and independent validation based on different metrics showed that GCNCPR-ACPs significantly outperformed state-of-the-art methods. Specifically, the evaluation indicators of Matthews Correlation Coefficient (MCC) and AUC of our predicator were 69.5% and 90%, respectively, which were 4.3% and 2% higher than those of the other predictors, respectively, in ten-fold cross-validation. And in the independent test, the scores of MCC and SP were 69.6% and 93.9%, respectively, which were 37.6% and 5.5% higher than those of the other predictors, respectively. The overall results showed that the GCNCPR-ACPs method proposed in the current paper can effectively predict ACPs.


Asunto(s)
Aminoácidos , Proyectos de Investigación , Secuencia de Aminoácidos , Benchmarking , Almacenamiento y Recuperación de la Información
18.
BMC Bioinformatics ; 23(1): 9, 2022 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-34983364

RESUMEN

BACKGROUND: Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. RESULTS: In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. CONCLUSIONS: The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.


Asunto(s)
MicroARNs , Preparaciones Farmacéuticas , ARN Largo no Codificante , Área Bajo la Curva , Humanos , MicroARNs/metabolismo , Proteínas , ARN Largo no Codificante/metabolismo
19.
BMC Bioinformatics ; 23(1): 516, 2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36456957

RESUMEN

BACKGROUND: Drug repositioning is a very important task that provides critical information for exploring the potential efficacy of drugs. Yet developing computational models that can effectively predict drug-disease associations (DDAs) is still a challenging task. Previous studies suggest that the accuracy of DDA prediction can be improved by integrating different types of biological features. But how to conduct an effective integration remains a challenging problem for accurately discovering new indications for approved drugs. METHODS: In this paper, we propose a novel meta-path based graph representation learning model, namely RLFDDA, to predict potential DDAs on heterogeneous biological networks. RLFDDA first calculates drug-drug similarities and disease-disease similarities as the intrinsic biological features of drugs and diseases. A heterogeneous network is then constructed by integrating DDAs, disease-protein associations and drug-protein associations. With such a network, RLFDDA adopts a meta-path random walk model to learn the latent representations of drugs and diseases, which are concatenated to construct joint representations of drug-disease associations. As the last step, we employ the random forest classifier to predict potential DDAs with their joint representations. RESULTS: To demonstrate the effectiveness of RLFDDA, we have conducted a series of experiments on two benchmark datasets by following a ten-fold cross-validation scheme. The results show that RLFDDA yields the best performance in terms of AUC and F1-score when compared with several state-of-the-art DDAs prediction models. We have also conducted a case study on two common diseases, i.e., paclitaxel and lung tumors, and found that 7 out of top-10 diseases and 8 out of top-10 drugs have already been validated for paclitaxel and lung tumors respectively with literature evidence. Hence, the promising performance of RLFDDA may provide a new perspective for novel DDAs discovery over heterogeneous networks.


Asunto(s)
Aprendizaje , Neoplasias Pulmonares , Humanos , Benchmarking , Descubrimiento de Drogas , Paclitaxel
20.
BMC Bioinformatics ; 23(1): 429, 2022 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-36245002

RESUMEN

BACKGROUND: Gene expression is regulated at different molecular levels, including chromatin accessibility, transcription, RNA maturation, and transport. These regulatory mechanisms have strong connections with cellular metabolism. In order to study the cellular system and its functioning, omics data at each molecular level can be generated and efficiently integrated. Here, we propose BRANENET, a novel multi-omics integration framework for multilayer heterogeneous networks. BRANENET is an expressive, scalable, and versatile method to learn node embeddings, leveraging random walk information within a matrix factorization framework. Our goal is to efficiently integrate multi-omics data to study different regulatory aspects of multilayered processes that occur in organisms. We evaluate our framework using multi-omics data of Saccharomyces cerevisiae, a well-studied yeast model organism. RESULTS: We test BRANENET on transcriptomics (RNA-seq) and targeted metabolomics (NMR) data for wild-type yeast strain during a heat-shock time course of 0, 20, and 120 min. Our framework learns features for differentially expressed bio-molecules showing heat stress response. We demonstrate the applicability of the learned features for targeted omics inference tasks: transcription factor (TF)-target prediction, integrated omics network (ION) inference, and module identification. The performance of BRANENET is compared to existing network integration methods. Our model outperforms baseline methods by achieving high prediction scores for a variety of downstream tasks.


Asunto(s)
ARN , Saccharomyces cerevisiae , Cromatina , RNA-Seq , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda