Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 190
Filter
1.
Res Sq ; 2024 Sep 18.
Article in English | MEDLINE | ID: mdl-39372928

ABSTRACT

Understanding complex interactions in biomedical networks is crucial for advancements in biomedicine, but traditional link prediction (LP) methods are limited in capturing this complexity. Representation-based learning techniques improve prediction accuracy by mapping nodes to low-dimensional embeddings, yet they often struggle with interpretability and scalability. We present BioPathNet, a novel graph neural network framework based on the Neural Bellman-Ford Network (NBFNet), addressing these limitations through path-based reasoning for LP in biomedical knowledge graphs. Unlike node-embedding frameworks, BioPathNet learns representations between node pairs by considering all relations along paths, enhancing prediction accuracy and interpretability. This allows visualization of influential paths and facilitates biological validation. BioPathNet leverages a background regulatory graph (BRG) for enhanced message passing and uses stringent negative sampling to improve precision. In evaluations across various LP tasks, such as gene function annotation, drug-disease indication, synthetic lethality, and lncRNA-mRNA interaction prediction, BioPathNet consistently outperformed shallow node embedding methods, relational graph neural networks and task-specific state-of-the-art methods, demonstrating robust performance and versatility. Our study predicts novel drug indications for diseases like acute lymphoblastic leukemia (ALL) and Alzheimer's, validated by medical experts and clinical trials. We also identified new synthetic lethality gene pairs and regulatory interactions involving lncRNAs and target genes, confirmed through literature reviews. BioPathNet's interpretability will enable researchers to trace prediction paths and gain molecular insights, making it a valuable tool for drug discovery, personalized medicine and biology in general.

2.
J Bioinform Comput Biol ; 22(4): 2450020, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39262053

ABSTRACT

Polypharmacy, the use of drug combinations, is an effective approach for treating complex diseases, but it increases the risk of adverse effects. To predict novel polypharmacy side effects based on known ones, many computational methods have been proposed. However, most of them generate deterministic low-dimensional embeddings when modeling the latent space of drugs, which cannot effectively capture potential side effect associations between drugs. In this study, we present SIPSE, a novel approach for predicting polypharmacy side effects. SIPSE integrates single-drug side effect information and drug-target protein data to construct novel drug feature vectors. Leveraging a semi-implicit graph variational auto-encoder, SIPSE models known polypharmacy side effects and generates flexible latent distributions for drug nodes. SIPSE infers the current node distribution by combining the distributions of neighboring nodes with embedding noise. By sampling node embeddings from these distributions, SIPSE effectively predicts polypharmacy side effects between drugs. One key innovation of SIPSE is its incorporation of uncertainty propagation through noise embedding and neighborhood sharing, enhancing its graph analysis capabilities. Extensive experiments on a benchmark dataset of polypharmacy side effects demonstrated that SIPSE significantly outperformed five state-of-the-art methods in predicting polypharmacy side effects.


Subject(s)
Computational Biology , Drug-Related Side Effects and Adverse Reactions , Polypharmacy , Computational Biology/methods , Humans , Algorithms
3.
J Biomed Inform ; 158: 104725, 2024 Sep 10.
Article in English | MEDLINE | ID: mdl-39265815

ABSTRACT

OBJECTIVE: As new knowledge is produced at a rapid pace in the biomedical field, existing biomedical Knowledge Graphs (KGs) cannot be manually updated in a timely manner. Previous work in Natural Language Processing (NLP) has leveraged link prediction to infer the missing knowledge in general-purpose KGs. Inspired by this, we propose to apply link prediction to existing biomedical KGs to infer missing knowledge. Although Knowledge Graph Embedding (KGE) methods are effective in link prediction tasks, they are less capable of capturing relations between communities of entities with specific attributes (Fanourakis et al., 2023). METHODS: To address this challenge, we proposed an entity distance-based method for abstracting a Community Knowledge Graph (CKG) from a simplified version of the pre-existing PubMed Knowledge Graph (PKG) (Xu et al., 2020). For link prediction on the abstracted CKG, we proposed an extension approach for the existing KGE models by linking the information in the PKG to the abstracted CKG. The applicability of this extension was proved by employing six well-known KGE models: TransE, TransH, DistMult, ComplEx, SimplE, and RotatE. Evaluation metrics including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@k were used to assess the link prediction performance. In addition, we presented a backtracking process that traces the results of CKG link prediction back to the PKG scale for further comparison. RESULTS: Six different CKGs were abstracted from the PKG by using embeddings of the six KGE methods. The results of link prediction in these abstracted CKGs indicate that our proposed extension can improve the existing KGE methods, achieving a top-10 accuracy of 0.69 compared to 0.5 for TransE, 0.7 compared to 0.54 for TransH, 0.67 compared to 0.6 for DistMult, 0.73 compared to 0.57 for ComplEx, 0.73 compared to 0.63 for SimplE, and 0.85 compared to 0.76 for RotatE on their CKGs, respectively. These improved performances also highlight the wide applicability of the extension approach. CONCLUSION: This study proposed novel insights into abstracting CKGs from the PKG. The extension approach indicated enhanced performance of the existing KGE methods and has applicability. As an interesting future extension, we plan to conduct link prediction for entities that are newly introduced to the PKG.

4.
J Biomed Inform ; 158: 104730, 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39326691

ABSTRACT

OBJECTIVE: To develop the FuseLinker, a novel link prediction framework for biomedical knowledge graphs (BKGs), which fully exploits the graph's structural, textual and domain knowledge information. We evaluated the utility of FuseLinker in the graph-based drug repurposing task through detailed case studies. METHODS: FuseLinker leverages fused pre-trained text embedding and domain knowledge embedding to enhance the graph neural network (GNN)-based link prediction model tailored for BKGs. This framework includes three parts: a) obtain text embeddings for BKGs using embedding-visible large language models (LLMs), b) learn the representations of medical ontology as domain knowledge information by employing the Poincaré graph embedding method, and c) fuse these embeddings and further learn the graph structure representations of BKGs by applying a GNN-based link prediction model. We evaluated FuseLinker against traditional knowledge graph embedding models and a conventional GNN-based link prediction model across four public BKG datasets. Additionally, we examined the impact of using different embedding-visible LLMs on FuseLinker's performance. Finally, we investigated FuseLinker's ability to generate medical hypotheses through two drug repurposing case studies for Sorafenib and Parkinson's disease. RESULTS: By comparing FuseLinker with baseline models on four BKGs, our method demonstrates superior performance. The Mean Reciprocal Rank (MRR) and Area Under receiver operating characteristic Curve (AUROC) for KEGG50k, Hetionet, SuppKG and ADInt are 0.969 and 0.987, 0.548 and 0.903, 0.739 and 0.928, and 0.831 and 0.890, respectively. CONCLUSION: Our study demonstrates that FuseLinker is an effective novel link prediction framework that integrates multiple graph information and shows significant potential for practical applications in biomedical and clinical tasks. Source code and data are available at https://github.com/YKXia0/FuseLinker.

5.
PeerJ ; 12: e17975, 2024.
Article in English | MEDLINE | ID: mdl-39247551

ABSTRACT

Link prediction (LP) is a task for the identification of potential, missing and spurious links in complex networks. Protein-protein interaction (PPI) networks are important for understanding the underlying biological mechanisms of diseases. Many complex networks have been constructed using LP methods; however, there are a limited number of studies that focus on disease-related gene predictions and evaluate these genes using various evaluation criteria. The main objective of the study is to investigate the effect of a simple ensemble method in disease related gene predictions. Local similarity indices (LSIs) based disease related gene predictions were integrated by a simple ensemble decision method, simple majority voting (SMV), on the PPI network to detect accurate disease related genes. Human PPI network was utilized to discover potential disease related genes using four LSIs for the gene prediction. LSIs discovered potential links between disease related genes, which were obtained from OMIM database for gastric, colorectal, breast, prostate and lung cancers. LSIs based disease related genes were ranked due to their LSI scores in descending order for retrieving the top 10, 50 and 100 disease related genes. SMV integrated four LSIs based predictions to obtain SMV based the top 10, 50 and 100 disease related genes. The performance of LSIs based and SMV based genes were evaluated separately by employing overlap analyses, which were performed with GeneCard disease-gene relation dataset and Gene Ontology (GO) terms. The GO-terms were used for biological assessment for the inferred gene lists by LSIs and SMV on all cancer types. Adamic-Adar (AA), Resource Allocation Index (RAI), and SMV based gene lists are generally achieved good performance results on all cancers in both overlap analyses. SMV also outperformed on breast cancer data. The increment in the selection of the number of the top ranked disease related genes also enhanced the performance results of SMV.


Subject(s)
Computational Biology , Humans , Computational Biology/methods , Protein Interaction Maps/genetics , Neoplasms/genetics , Databases, Genetic , Gene Regulatory Networks/genetics , Genetic Predisposition to Disease , Algorithms
6.
Sci Rep ; 14(1): 21342, 2024 09 12.
Article in English | MEDLINE | ID: mdl-39266676

ABSTRACT

Inferring gene regulatory networks through deep learning and causal inference methods is a crucial task in the field of computational biology and bioinformatics. This study presents a novel approach that uses a Graph Convolutional Network (GCN) guided by causal information to infer Gene Regulatory Networks (GRN). The transfer entropy and reconstruction layer are utilized to achieve causal feature reconstruction, mitigating the information loss problem caused by multiple rounds of neighbor aggregation in GCN, resulting in a causal and integrated representation of node features. Separable features are extracted from gene expression data by the Gaussian-kernel Autoencoder to improve computational efficiency. Experimental results on the DREAM5 and the mDC dataset demonstrate that our method exhibits superior performance compared to existing algorithms, as indicated by the higher values of the AUPRC metrics. Furthermore, the incorporation of causal feature reconstruction enhances the inferred GRN, rendering them more reasonable, accurate, and reliable.


Subject(s)
Algorithms , Computational Biology , Gene Regulatory Networks , Computational Biology/methods , Humans , Deep Learning , Gene Expression Profiling/methods , Neural Networks, Computer
7.
J Environ Manage ; 370: 122505, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39293117

ABSTRACT

Reducing urban carbon emissions (UCEs) holds paramount importance for global sustainable development. However, the complexity of interactions among urban spatial units has impeded further research on UCEs. This study investigates synergistic emission reduction between cities by analyzing the spatial complexity within the UCEs network. The future potential for synergistic carbon emissions reduction is predicted by the link prediction algorithm. A case study conducted in the Pearl River Basin of China demonstrates that the UCEs network has a complex spatial structure, and the synergistic capacity of emission reduction among cities is enhanced. The core cities in the UCEs network, including Dongguan, Shenzhen, and Guangzhou, have spillover effects that contribute to synergistic emission reduction. Community detection reveals that the common characteristics associated with UCEs become concentrated, thereby enhancing the synergy of joint efforts between cities. The link prediction algorithm indicates a high probability of strengthened carbon emission connections in the Pearl River Delta, alongside those between upstream cities, which shows potential in forecasting synergistic emission reductions. Our research framework offers a comprehensive analysis for synergistic emission reduction from the spatial complexity of UCEs network and link prediction. It acts as a worthwhile reference for developing differentiated policies on synergistic emission reduction.

8.
bioRxiv ; 2024 Aug 10.
Article in English | MEDLINE | ID: mdl-39149355

ABSTRACT

Understanding complex interactions in biomedical networks is crucial for advancements in biomedicine, but traditional link prediction (LP) methods are limited in capturing this complexity. Representation-based learning techniques improve prediction accuracy by mapping nodes to low-dimensional embeddings, yet they often struggle with interpretability and scalability. We present BioPathNet, a novel graph neural network framework based on the Neural Bellman-Ford Network (NBFNet), addressing these limitations through path-based reasoning for LP in biomedical knowledge graphs. Unlike node-embedding frameworks, BioPathNet learns representations between node pairs by considering all relations along paths, enhancing prediction accuracy and interpretability. This allows visualization of influential paths and facilitates biological validation. BioPathNet leverages a background regulatory graph (BRG) for enhanced message passing and uses stringent negative sampling to improve precision. In evaluations across various LP tasks, such as gene function annotation, drug-disease indication, synthetic lethality, and lncRNA-mRNA interaction prediction, BioPathNet consistently outperformed shallow node embedding methods, relational graph neural networks and task-specific state-of-the-art methods, demonstrating robust performance and versatility. Our study predicts novel drug indications for diseases like acute lymphoblastic leukemia (ALL) and Alzheimer's, validated by medical experts and clinical trials. We also identified new synthetic lethality gene pairs and regulatory interactions involving lncRNAs and target genes, confirmed through literature reviews. BioPathNet's interpretability will enable researchers to trace prediction paths and gain molecular insights, making it a valuable tool for drug discovery, personalized medicine and biology in general.

9.
Comput Biol Med ; 181: 109072, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39216404

ABSTRACT

Automated generation of knowledge graphs that accurately capture published information can help with knowledge organization and access, which have the potential to accelerate discovery and innovation. Here, we present an integrated pipeline to construct a large-scale knowledge graph using large language models in an active learning setting. We apply our pipeline to the association of raw food, ingredients, and chemicals, a domain that lacks such knowledge resources. By using an iterative active learning approach of 4120 manually curated premise-hypothesis pairs as training data for ten consecutive cycles, the entailment model extracted 230,848 food-chemical composition relationships from 155,260 scientific papers, with 106,082 (46.0 %) of them never been reported in any published database. To augment the knowledge incorporated in the knowledge graph, we further incorporated information from 5 external databases and ontology sources. We then applied a link prediction model to identify putative food-chemical relationships that were not part of the constructed knowledge graph. Validation of the 443 hypotheses generated by the link prediction model resulted in 355 new food-chemical relationships, while results show that the model score correlates well (R2 = 0.70) with the probability of a novel finding. This work demonstrates how automated learning from literature at scale can accelerate discovery and support practical applications through reproducible, evidence-based capture of latent interactions of diverse entities, such as food and chemicals.


Subject(s)
Databases, Factual , Food , Data Mining/methods , Humans , Machine Learning
10.
bioRxiv ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39005357

ABSTRACT

Background: Alzheimer's disease (AD), a progressive neurodegenerative disorder, continues to increase in prevalence without any effective treatments to date. In this context, knowledge graphs (KGs) have emerged as a pivotal tool in biomedical research, offering new perspectives on drug repurposing and biomarker discovery by analyzing intricate network structures. Our study seeks to build an AD-specific knowledge graph, highlighting interactions among AD, genes, variants, chemicals, drugs, and other diseases. The goal is to shed light on existing treatments, potential targets, and diagnostic methods for AD, thereby aiding in drug repurposing and the identification of biomarkers. Results: We annotated 800 PubMed abstracts and leveraged GPT-4 for text augmentation to enrich our training data for named entity recognition (NER) and relation classification. A comprehensive data mining model, integrating NER and relationship classification, was trained on the annotated corpus. This model was subsequently applied to extract relation triplets from unannotated abstracts. To enhance entity linking, we utilized a suite of reference biomedical databases and refine the linking accuracy through abbreviation resolution. As a result, we successfully identified 3,199,276 entity mentions and 633,733 triplets, elucidating connections between 5,000 unique entities. These connections were pivotal in constructing a comprehensive Alzheimer's Disease Knowledge Graph (ADKG). We also integrated the ADKG constructed after entity linking with other biomedical databases. The ADKG served as a training ground for Knowledge Graph Embedding models with the high-ranking predicted triplets supported by evidence, underscoring the utility of ADKG in generating testable scientific hypotheses. Further application of ADKG in predictive modeling using the UK Biobank data revealed models based on ADKG outperforming others, as evidenced by higher values in the areas under the receiver operating characteristic (ROC) curves. Conclusion: The ADKG is a valuable resource for generating hypotheses and enhancing predictive models, highlighting its potential to advance AD's disease research and treatment strategies.

11.
Entropy (Basel) ; 26(7)2024 Jul 10.
Article in English | MEDLINE | ID: mdl-39056950

ABSTRACT

Graph representation learning aims to map nodes or edges within a graph using low-dimensional vectors, while preserving as much topological information as possible. During past decades, numerous algorithms for graph representation learning have emerged. Among them, proximity matrix representation methods have been shown to exhibit excellent performance in experiments and scale to large graphs with millions of nodes. However, with the rapid development of the Internet, information interactions are happening at the scale of billions every moment. Most methods for similarity matrix factorization still focus on static graphs, leading to incomplete similarity descriptions and low embedding quality. To enhance the embedding quality of temporal graph learning, we propose a temporal graph representation learning model based on the matrix factorization of Time-constrained Personalize PageRank (TPPR) matrices. TPPR, an extension of personalized PageRank (PPR) that incorporates temporal information, better captures node similarities in temporal graphs. Based on this, we use Single Value Decomposition or Nonnegative Matrix Factorization to decompose TPPR matrices to obtain embedding vectors for each node. Through experiments on tasks such as link prediction, node classification, and node clustering across multiple temporal graphs, as well as a comparison with various experimental methods, we find that graph representation learning algorithms based on TPPR matrix factorization achieve overall outstanding scores on multiple temporal datasets, highlighting their effectiveness.

12.
Big Data ; 2024 Jul 27.
Article in English | MEDLINE | ID: mdl-39066722

ABSTRACT

Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.

13.
Sci Rep ; 14(1): 16587, 2024 07 18.
Article in English | MEDLINE | ID: mdl-39025897

ABSTRACT

Drug repurposing aims to find new therapeutic applications for existing drugs in the pharmaceutical market, leading to significant savings in time and cost. The use of artificial intelligence and knowledge graphs to propose repurposing candidates facilitates the process, as large amounts of data can be processed. However, it is important to pay attention to the explainability needed to validate the predictions. We propose a general architecture to understand several explainable methods for graph completion based on knowledge graphs and design our own architecture for drug repurposing. We present XG4Repo (eXplainable Graphs for Repurposing), a framework that takes advantage of the connectivity of any biomedical knowledge graph to link compounds to the diseases they can treat. Our method allows methapaths of different types and lengths, which are automatically generated and optimised based on data. XG4Repo focuses on providing meaningful explanations to the predictions, which are based on paths from compounds to diseases. These paths include nodes such as genes, pathways, side effects, or anatomies, so they provide information about the targets and other characteristics of the biomedical mechanism that link compounds and diseases. Paths make predictions interpretable for experts who can validate them and use them in further research on drug repurposing. We also describe three use cases where we analyse new uses for Epirubicin, Paclitaxel, and Predinisone and present the paths that support the predictions.


Subject(s)
Drug Repositioning , Drug Repositioning/methods , Humans , Artificial Intelligence , Algorithms
14.
Front Public Health ; 12: 1386495, 2024.
Article in English | MEDLINE | ID: mdl-38827618

ABSTRACT

Introduction: Mitigating the spread of infectious diseases is of paramount concern for societal safety, necessitating the development of effective intervention measures. Epidemic simulation is widely used to evaluate the efficacy of such measures, but realistic simulation environments are crucial for meaningful insights. Despite the common use of contact-tracing data to construct realistic networks, they have inherent limitations. This study explores reconstructing simulation networks using link prediction methods as an alternative approach. Methods: The primary objective of this study is to assess the effectiveness of intervention measures on the reconstructed network, focusing on the 2015 MERS-CoV outbreak in South Korea. Contact-tracing data were acquired, and simulation networks were reconstructed using the graph autoencoder (GAE)-based link prediction method. A scale-free (SF) network was employed for comparison purposes. Epidemic simulations were conducted to evaluate three intervention strategies: Mass Quarantine (MQ), Isolation, and Isolation combined with Acquaintance Quarantine (AQ + Isolation). Results: Simulation results showed that AQ + Isolation was the most effective intervention on the GAE network, resulting in consistent epidemic curves due to high clustering coefficients. Conversely, MQ and AQ + Isolation were highly effective on the SF network, attributed to its low clustering coefficient and intervention sensitivity. Isolation alone exhibited reduced effectiveness. These findings emphasize the significant impact of network structure on intervention outcomes and suggest a potential overestimation of effectiveness in SF networks. Additionally, they highlight the complementary use of link prediction methods. Discussion: This innovative methodology provides inspiration for enhancing simulation environments in future endeavors. It also offers valuable insights for informing public health decision-making processes, emphasizing the importance of realistic simulation environments and the potential of link prediction methods.


Subject(s)
Contact Tracing , Coronavirus Infections , Disease Outbreaks , Middle East Respiratory Syndrome Coronavirus , Humans , Republic of Korea/epidemiology , Coronavirus Infections/transmission , Coronavirus Infections/prevention & control , Coronavirus Infections/epidemiology , Contact Tracing/methods , Disease Outbreaks/prevention & control , Quarantine , Computer Simulation
15.
Entropy (Basel) ; 26(6)2024 May 21.
Article in English | MEDLINE | ID: mdl-38920442

ABSTRACT

Link prediction plays a crucial role in identifying future connections within complex networks, facilitating the analysis of network evolution across various domains such as biological networks, social networks, recommender systems, and more. Researchers have proposed various centrality measures, such as degree, clustering coefficient, betweenness, and closeness centralities, to compute similarity scores for predicting links in these networks. These centrality measures leverage both the local and global information of nodes within the network. In this study, we present a novel approach to link prediction using similarity score by utilizing average centrality measures based on local and global centralities, namely Similarity based on Average Degree (SACD), Similarity based on Average Betweenness (SACB), Similarity based on Average Closeness (SACC), and Similarity based on Average Clustering Coefficient (SACCC). Our approach involved determining centrality scores for each node, calculating the average centrality for the entire graph, and deriving similarity scores through common neighbors. We then applied centrality scores to these common neighbors and identified nodes with above average centrality. To evaluate our approach, we compared proposed measures with existing local similarity-based link prediction measures, including common neighbors, the Jaccard coefficient, Adamic-Adar, resource allocation, preferential attachment, as well as recent measures like common neighbor and the Centrality-based Parameterized Algorithm (CCPA), and keyword network link prediction (KNLP). We conducted experiments on four real-world datasets. The proposed similarity scores based on average centralities demonstrate significant improvements. We observed an average enhancement of 24% in terms of Area Under the Receiver Operating Characteristic (AUROC) compared to existing local similarity measures, and a 31% improvement over recent measures. Furthermore, we witnessed an average improvement of 49% and 51% in the Area Under Precision-Recall (AUPR) compared to existing and recent measures. Our comprehensive experiments highlight the superior performance of the proposed method.

16.
Entropy (Basel) ; 26(6)2024 May 30.
Article in English | MEDLINE | ID: mdl-38920486

ABSTRACT

Link prediction is recognized as a crucial means to analyze dynamic social networks, revealing the principles of social relationship evolution. However, the complex topology and temporal evolution characteristics of dynamic social networks pose significant research challenges. This study introduces an innovative fusion framework that incorporates entropy, causality, and a GCN model, focusing specifically on link prediction in dynamic social networks. Firstly, the framework preprocesses the raw data, extracting and recording timestamp information between interactions. It then introduces the concept of "Temporal Information Entropy (TIE)", integrating it into the Node2Vec algorithm's random walk to generate initial feature vectors for nodes in the graph. A causality analysis model is subsequently applied for secondary processing of the generated feature vectors. Following this, an equal dataset is constructed by adjusting the ratio of positive and negative samples. Lastly, a dedicated GCN model is used for model training. Through extensive experimentation in multiple real social networks, the framework proposed in this study demonstrated a better performance than other methods in key evaluation indicators such as precision, recall, F1 score, and accuracy. This study provides a fresh perspective for understanding and predicting link dynamics in social networks and has significant practical value.

17.
BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38872097

ABSTRACT

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .


Subject(s)
Benchmarking , Benchmarking/methods , Algorithms , Biomedical Research/methods , Software , Machine Learning , Databases, Factual , Computational Biology/methods , Semantics
18.
Front Genet ; 15: 1388015, 2024.
Article in English | MEDLINE | ID: mdl-38737125

ABSTRACT

LncRNAs are an essential type of non-coding RNAs, which have been reported to be involved in various human pathological conditions. Increasing evidence suggests that drugs can regulate lncRNAs expression, which makes it possible to develop lncRNAs as therapeutic targets. Thus, developing in-silico methods to predict lncRNA-drug associations (LDAs) is a critical step for developing lncRNA-based therapies. In this study, we predict LDAs by using graph convolutional networks (GCN) and graph attention networks (GAT) based on lncRNA and drug similarity networks. Results show that our proposed method achieves good performance (average AUCs > 0.92) on five datasets. In addition, case studies and KEGG functional enrichment analysis further prove that the model can effectively identify novel LDAs. On the whole, this study provides a deep learning-based framework for predicting novel LDAs, which will accelerate the lncRNA-targeted drug development process.

19.
Sci Rep ; 14(1): 11932, 2024 05 24.
Article in English | MEDLINE | ID: mdl-38789535

ABSTRACT

Probiotics are living microorganisms that provide health benefits to their hosts, potentially aiding in the treatment or prevention of various diseases, including diarrhea, irritable bowel syndrome, ulcerative colitis, and Crohn's disease. Motivated by successful applications of link prediction in medical and biological networks, we applied link prediction to the probiotic-disease network to identify unreported relations. Using data from the Probio database and International Classification of Diseases-10th Revision (ICD-10) resources, we constructed a bipartite graph focused on the relationship between probiotics and diseases. We applied customized link prediction algorithms for this bipartite network, including common neighbors, Jaccard coefficient, and Adamic/Adar ranking formulas. We evaluated the results using Area under the Curve (AUC) and precision metrics. Our analysis revealed that common neighbors outperformed the other methods, with an AUC of 0.96 and precision of 0.6, indicating that basic formulas can predict at least six out of ten probable relations correctly. To support our findings, we conducted an exact search of the top 20 predictions and found six confirming papers on Google Scholar and Science Direct. Evidence suggests that Lactobacillus jensenii may provide prophylactic and therapeutic benefits for gastrointestinal diseases and that Lactobacillus acidophilus may have potential activity against urologic and female genital illnesses. Further investigation of other predictions through additional preclinical and clinical studies is recommended. Future research may focus on deploying more powerful link prediction algorithms to achieve better and more accurate results.


Subject(s)
Probiotics , Probiotics/therapeutic use , Humans , Algorithms , Computational Biology/methods
20.
J Theor Biol ; 589: 111850, 2024 07 21.
Article in English | MEDLINE | ID: mdl-38740126

ABSTRACT

Protein-protein interactions (PPIs) are crucial for various biological processes, and predicting PPIs is a major challenge. To solve this issue, the most common method is link prediction. Currently, the link prediction methods based on network Paths of Length Three (L3) have been proven to be highly effective. In this paper, we propose a novel link prediction algorithm, named SMS, which is based on L3 and protein similarities. We first design a mixed similarity that combines the topological structure and attribute features of nodes. Then, we compute the predicted value by summing the product of all similarities along the L3. Furthermore, we propose the Max Similarity Multiplied Similarity (maxSMS) algorithm from the perspective of maximum impact. Our computational prediction results show that on six datasets, including S. cerevisiae, H. sapiens, and others, the maxSMS algorithm improves the precision of the top 500, area under the precision-recall curve, and normalized discounted cumulative gain by an average of 26.99%, 53.67%, and 6.7%, respectively, compared to other optimal methods.


Subject(s)
Algorithms , Protein Interaction Mapping , Protein Interaction Maps , Humans , Protein Interaction Mapping/methods , Computational Biology/methods , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/genetics , Databases, Protein , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL