Búsqueda | OPS/OMS Uruguay

1.

Link prediction using low-dimensional node embeddings: The measurement problem.

Menand, Nicolas; Seshadhri, C.

Proc Natl Acad Sci U S A ; 121(8): e2312527121, 2024 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-38363864

RESUMEN

Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.

2.

SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication.

Vasighizaker, Akram; Hora, Sheena; Zeng, Raymond; Rueda, Luis.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38605638

RESUMEN

Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.

Asunto(s)

Comunicación Celular , Transducción de Señal , Humanos , Animales , Ratones , Comunicación Celular/genética , Aprendizaje , Redes Neurales de la Computación , Expresión Génica

3.

Hyperedge prediction and the statistical mechanisms of higher-order and lower-order interactions in complex networks.

Sales-Pardo, Marta; Mariné-Tena, Aleix; Guimerà, Roger.

Proc Natl Acad Sci U S A ; 120(50): e2303887120, 2023 Dec 12.

Artículo en Inglés | MEDLINE | ID: mdl-38060555

RESUMEN

Complex networked systems often exhibit higher-order interactions, beyond dyadic interactions, which can dramatically alter their observed behavior. Consequently, understanding hypergraphs from a structural perspective has become increasingly important. Statistical, group-based inference approaches are well suited for unveiling the underlying community structure and predicting unobserved interactions. However, these approaches often rely on two key assumptions: that the same groups can explain hyperedges of any order and that interactions are assortative, meaning that edges are formed by nodes with the same group memberships. To test these assumptions, we propose a group-based generative model for hypergraphs that does not impose an assortative mechanism to explain observed higher-order interactions, unlike current approaches. Our model allows us to explore the validity of the assumptions. Our results indicate that the first assumption appears to hold true for real networks. However, the second assumption is not necessarily accurate; we find that a combination of general statistical mechanisms can explain observed hyperedges. Finally, with our approach, we are also able to determine the importance of lower and high-order interactions for predicting unobserved interactions. Our research challenges the conventional assumptions of group-based inference methodologies and broadens our understanding of the underlying structure of hypergraphs.

4.

A dual-modal graph learning framework for identifying interaction events among chemical and biotech drugs.

Ru, Zhongying; Wu, Yangyang; Shao, Jinning; Yin, Jianwei; Qian, Linghui; Miao, Xiaoye.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37507113

RESUMEN

Drug-drug interaction (DDI) identification is essential to clinical medicine and drug discovery. The two categories of drugs (i.e. chemical drugs and biotech drugs) differ remarkably in molecular properties, action mechanisms, etc. Biotech drugs are up-to-comers but highly promising in modern medicine due to higher specificity and fewer side effects. However, existing DDI prediction methods only consider chemical drugs of small molecules, not biotech drugs of large molecules. Here, we build a large-scale dual-modal graph database named CB-DB and customize a graph-based framework named CB-TIP to reason event-aware DDIs for both chemical and biotech drugs. CB-DB comprehensively integrates various interaction events and two heterogeneous kinds of molecular structures. It imports endogenous proteins founded on the fact that most drugs take effects by interacting with endogenous proteins. In the modality of molecular structure, drugs and endogenous proteins are two heterogeneous kinds of graphs, while in the modality of interaction, they are nodes connected by events (i.e. edges of different relationships). CB-TIP employs graph representation learning methods to generate drug representations from either modality and then contrastively mixes them to predict how likely an event occurs when a drug meets another in an end-to-end manner. Experiments demonstrate CB-TIP's great superiority in DDI prediction and the promising potential of uncovering novel DDIs.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Interacciones Farmacológicas , Descubrimiento de Drogas , Estructura Molecular , Proteínas

5.

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks.

Mao, Guo; Pang, Zhengbin; Zuo, Ke; Wang, Qinglin; Pei, Xiangdong; Chen, Xinhai; Liu, Jie.

Brief Bioinform ; 24(6)2023 09 22.

Artículo en Inglés | MEDLINE | ID: mdl-37985457

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.

Asunto(s)

Regulación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Reproducibilidad de los Resultados , Redes Neurales de la Computación , Redes Reguladoras de Genes , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos

6.

DMGL-MDA: A dual-modal graph learning method for microbe-drug association prediction.

Zhu, Bei; Yu, Hao-Yang; Du, Bing-Xue; Shi, Jian-Yu.

Methods ; 222: 51-56, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38184219

RESUMEN

The interaction between human microbes and drugs can significantly impact human physiological functions. It is crucial to identify potential microbe-drug associations (MDAs) before drug administration. However, conventional biological experiments to predict MDAs are plagued by drawbacks such as time-consuming, high costs, and potential risks. On the contrary, computational approaches can speed up the screening of MDAs at a low cost. Most computational models usually use a drug similarity matrix as the initial feature representation of drugs and stack the graph neural network layers to extract the features of network nodes. However, different calculation methods result in distinct similarity matrices, and message passing in graph neural networks (GNNs) induces phenomena of over-smoothing and over-squashing, thereby impacting the performance of the model. To address these issues, we proposed a novel graph representation learning model, dual-modal graph learning for microbe-drug association prediction (DMGL-MDA). It comprises a dual-modal embedding module, a bipartite graph network embedding module, and a predictor module. To assess the performance of DMGL-MDA, we compared it against state-of-the-art methods using two benchmark datasets. Through cross-validation, we illustrated the superiority of DMGL-MDA. Furthermore, we conducted ablation experiments and case studies to validate the effective performance of the model.

Asunto(s)

Benchmarking , Redes Neurales de la Computación , Humanos , Proyectos de Investigación

7.

MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework.

Chen, Siqi; Li, Minghui; Semenov, Ivan.

Methods ; 224: 79-92, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38430967

RESUMEN

The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.

Asunto(s)

Descubrimiento de Drogas , Laboratorios , Interacciones Farmacológicas

8.

Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique.

Tyagin, Ilya; Safro, Ilya.

BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-38872097

RESUMEN

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .

Asunto(s)

Benchmarking , Benchmarking/métodos , Algoritmos , Investigación Biomédica/métodos , Programas Informáticos , Aprendizaje Automático , Bases de Datos Factuales , Biología Computacional/métodos , Semántica

9.

Ensembles of knowledge graph embedding models improve predictions for drug discovery.

Rivas-Barragan, Daniel; Domingo-Fernández, Daniel; Gadiya, Yojana; Healey, David.

Brief Bioinform ; 23(6)2022 11 19.

Artículo en Inglés | MEDLINE | ID: mdl-36384050

RESUMEN

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.

Asunto(s)

Descubrimiento de Drogas , Reconocimiento de Normas Patrones Automatizadas , Conocimiento , Aprendizaje Automático , Programas Informáticos

10.

Matrix factorization for biomedical link prediction and scRNA-seq data imputation: an empirical survey.

Ou-Yang, Le; Lu, Fan; Zhang, Zi-Chao; Wu, Min.

Brief Bioinform ; 23(1)2022 01 17.

Artículo en Inglés | MEDLINE | ID: mdl-34864871

RESUMEN

Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.

Asunto(s)

Análisis de Datos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma

11.

DTI-HETA: prediction of drug-target interactions based on GCN and GAT on heterogeneous graph.

Shao, Kanghao; Zhang, Yunhao; Wen, Yuqi; Zhang, Zhongnan; He, Song; Bo, Xiaochen.

Brief Bioinform ; 23(3)2022 05 13.

Artículo en Inglés | MEDLINE | ID: mdl-35380622

RESUMEN

Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.

Asunto(s)

Desarrollo de Medicamentos , Redes Neurales de la Computación , Desarrollo de Medicamentos/métodos , Interacciones Farmacológicas , Reposicionamiento de Medicamentos , Polímeros

12.

CHERRY: a Computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model.

Shang, Jiayu; Sun, Yanni.

Brief Bioinform ; 23(5)2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-35595715

RESUMEN

Prokaryotic viruses, which infect bacteria and archaea, are key players in microbial communities. Predicting the hosts of prokaryotic viruses helps decipher the dynamic relationship between microbes. Experimental methods for host prediction cannot keep pace with the fast accumulation of sequenced phages. Thus, there is a need for computational host prediction. Despite some promising results, computational host prediction remains a challenge because of the limited known interactions and the sheer amount of sequenced phages by high-throughput sequencing technologies. The state-of-the-art methods can only achieve 43% accuracy at the species level. In this work, we formulate host prediction as link prediction in a knowledge graph that integrates multiple protein and DNA-based sequence features. Our implementation named CHERRY can be applied to predict hosts for newly discovered viruses and to identify viruses infecting targeted bacteria. We demonstrated the utility of CHERRY for both applications and compared its performance with 11 popular host prediction methods. To our best knowledge, CHERRY has the highest accuracy in identifying virus-prokaryote interactions. It outperforms all the existing methods at the species level with an accuracy increase of 37%. In addition, CHERRY's performance on short contigs is more stable than other tools.

Asunto(s)

Bacteriófagos , Virus , Bacterias , Bacteriófagos/genética , ADN , Células Procariotas , Virus/genética

13.

Link prediction in protein-protein interaction network: A similarity multiplied similarity algorithm with paths of length three.

Cai, Wangmin; Liu, Peiqiang; Wang, Zunfang; Jiang, Hong; Liu, Chang; Fei, Zhaojie; Yang, Zhuang.

J Theor Biol ; 589: 111850, 2024 Jul 21.

Artículo en Inglés | MEDLINE | ID: mdl-38740126

RESUMEN

Protein-protein interactions (PPIs) are crucial for various biological processes, and predicting PPIs is a major challenge. To solve this issue, the most common method is link prediction. Currently, the link prediction methods based on network Paths of Length Three (L3) have been proven to be highly effective. In this paper, we propose a novel link prediction algorithm, named SMS, which is based on L3 and protein similarities. We first design a mixed similarity that combines the topological structure and attribute features of nodes. Then, we compute the predicted value by summing the product of all similarities along the L3. Furthermore, we propose the Max Similarity Multiplied Similarity (maxSMS) algorithm from the perspective of maximum impact. Our computational prediction results show that on six datasets, including S. cerevisiae, H. sapiens, and others, the maxSMS algorithm improves the precision of the top 500, area under the precision-recall curve, and normalized discounted cumulative gain by an average of 26.99%, 53.67%, and 6.7%, respectively, compared to other optimal methods.

Asunto(s)

Algoritmos , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Humanos , Mapeo de Interacción de Proteínas/métodos , Biología Computacional/métodos , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Bases de Datos de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética

14.

Detection and Validation of Macro-Activities in Human Inertial Signals Using Graph Link Prediction.

Wieland, Christoph; Pankratius, Victor.

Sensors (Basel) ; 24(4)2024 Feb 17.

Artículo en Inglés | MEDLINE | ID: mdl-38400439

RESUMEN

With the continuous development of new wearable devices, sensor-based human activity recognition is enjoying enormous popularity in research and industry. The signals from inertial sensors allow for the detection, classification, and analysis of human activities such as jogging, cycling, or swimming. However, human activity recognition is often limited to basic activities that occur in short, predetermined periods of time (sliding windows). Complex macro-activities, such as multi-step sports exercises or multi-step cooking recipes, are still only considered to a limited extent, while some works have investigated the classification of macro-activities, the automated understanding of how the underlying micro-activities interact remains an open challenge. This study addresses this gap through the application of graph link prediction, a well-known concept in graph theory and graph neural networks (GNNs). To this end, the presented approach transforms micro-activity sequences into micro-activity graphs that are then processed with a GNN. The evaluation on two derived real-world data sets shows that graph link prediction enables the accurate identification of interactions between micro-activities and the precise validation of composite macro-activities based on learned graph embeddings. Furthermore, this work shows that GNNs can benefit from positional encodings in sequence recognition tasks.

Asunto(s)

Ciclismo , Culinaria , Humanos , Terapia por Ejercicio , Industrias , Natación

15.

Link Prediction in Complex Networks Using Average Centrality-Based Similarity Score.

Nandini, Y V; Lakshmi, T Jaya; Enduri, Murali Krishna; Sharma, Hemlata.

Entropy (Basel) ; 26(6)2024 May 21.

Artículo en Inglés | MEDLINE | ID: mdl-38920442

RESUMEN

Link prediction plays a crucial role in identifying future connections within complex networks, facilitating the analysis of network evolution across various domains such as biological networks, social networks, recommender systems, and more. Researchers have proposed various centrality measures, such as degree, clustering coefficient, betweenness, and closeness centralities, to compute similarity scores for predicting links in these networks. These centrality measures leverage both the local and global information of nodes within the network. In this study, we present a novel approach to link prediction using similarity score by utilizing average centrality measures based on local and global centralities, namely Similarity based on Average Degree (SACD), Similarity based on Average Betweenness (SACB), Similarity based on Average Closeness (SACC), and Similarity based on Average Clustering Coefficient (SACCC). Our approach involved determining centrality scores for each node, calculating the average centrality for the entire graph, and deriving similarity scores through common neighbors. We then applied centrality scores to these common neighbors and identified nodes with above average centrality. To evaluate our approach, we compared proposed measures with existing local similarity-based link prediction measures, including common neighbors, the Jaccard coefficient, Adamic-Adar, resource allocation, preferential attachment, as well as recent measures like common neighbor and the Centrality-based Parameterized Algorithm (CCPA), and keyword network link prediction (KNLP). We conducted experiments on four real-world datasets. The proposed similarity scores based on average centralities demonstrate significant improvements. We observed an average enhancement of 24% in terms of Area Under the Receiver Operating Characteristic (AUROC) compared to existing local similarity measures, and a 31% improvement over recent measures. Furthermore, we witnessed an average improvement of 49% and 51% in the Area Under Precision-Recall (AUPR) compared to existing and recent measures. Our comprehensive experiments highlight the superior performance of the proposed method.

16.

Effective Temporal Graph Learning via Personalized PageRank.

Liao, Ziyu; Liu, Tao; He, Yue; Lin, Longlong.

Entropy (Basel) ; 26(7)2024 Jul 10.

Artículo en Inglés | MEDLINE | ID: mdl-39056950

RESUMEN

Graph representation learning aims to map nodes or edges within a graph using low-dimensional vectors, while preserving as much topological information as possible. During past decades, numerous algorithms for graph representation learning have emerged. Among them, proximity matrix representation methods have been shown to exhibit excellent performance in experiments and scale to large graphs with millions of nodes. However, with the rapid development of the Internet, information interactions are happening at the scale of billions every moment. Most methods for similarity matrix factorization still focus on static graphs, leading to incomplete similarity descriptions and low embedding quality. To enhance the embedding quality of temporal graph learning, we propose a temporal graph representation learning model based on the matrix factorization of Time-constrained Personalize PageRank (TPPR) matrices. TPPR, an extension of personalized PageRank (PPR) that incorporates temporal information, better captures node similarities in temporal graphs. Based on this, we use Single Value Decomposition or Nonnegative Matrix Factorization to decompose TPPR matrices to obtain embedding vectors for each node. Through experiments on tasks such as link prediction, node classification, and node clustering across multiple temporal graphs, as well as a comparison with various experimental methods, we find that graph representation learning algorithms based on TPPR matrix factorization achieve overall outstanding scores on multiple temporal datasets, highlighting their effectiveness.

17.

Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model.

Huang, Xiaoli; Li, Jingyu; Yuan, Yumiao.

Entropy (Basel) ; 26(6)2024 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-38920486

RESUMEN

Link prediction is recognized as a crucial means to analyze dynamic social networks, revealing the principles of social relationship evolution. However, the complex topology and temporal evolution characteristics of dynamic social networks pose significant research challenges. This study introduces an innovative fusion framework that incorporates entropy, causality, and a GCN model, focusing specifically on link prediction in dynamic social networks. Firstly, the framework preprocesses the raw data, extracting and recording timestamp information between interactions. It then introduces the concept of "Temporal Information Entropy (TIE)", integrating it into the Node2Vec algorithm's random walk to generate initial feature vectors for nodes in the graph. A causality analysis model is subsequently applied for secondary processing of the generated feature vectors. Following this, an equal dataset is constructed by adjusting the ratio of positive and negative samples. Lastly, a dedicated GCN model is used for model training. Through extensive experimentation in multiple real social networks, the framework proposed in this study demonstrated a better performance than other methods in key evaluation indicators such as precision, recall, F1 score, and accuracy. This study provides a fresh perspective for understanding and predicting link dynamics in social networks and has significant practical value.

18.

GKLOMLI: a link prediction model for inferring miRNA-lncRNA interactions by using Gaussian kernel-based method on network profile and linear optimization algorithm.

Wong, Leon; Wang, Lei; You, Zhu-Hong; Yuan, Chang-An; Huang, Yu-An; Cao, Mei-Yuan.

BMC Bioinformatics ; 24(1): 188, 2023 May 08.

Artículo en Inglés | MEDLINE | ID: mdl-37158823

RESUMEN

BACKGROUND: The limited knowledge of miRNA-lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. METHODS: In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA-lncRNA interactions (GKLOMLI). Given an observed miRNA-lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA-lncRNA interactions. RESULTS: To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. CONCLUSION: GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.

Asunto(s)

MicroARNs , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Reproducibilidad de los Resultados , Proyectos de Investigación , Algoritmos , MicroARNs/genética

19.

Normalized L3-based link prediction in protein-protein interaction networks.

Yuen, Ho Yin; Jansson, Jesper.

BMC Bioinformatics ; 24(1): 59, 2023 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-36814208

RESUMEN

BACKGROUND: Protein-protein interaction (PPI) data is an important type of data used in functional genomics. However, high-throughput experiments are often insufficient to complete the PPI interactome of different organisms. Computational techniques are thus used to infer missing data, with link prediction being one such approach that uses the structure of the network of PPIs known so far to identify non-edges whose addition to the network would make it more sound, according to some underlying assumptions. Recently, a new idea called the L3 principle introduced biological motivation into PPI link predictions, yielding predictors that are superior to general-purpose link predictors for complex networks. Interestingly, the L3 principle can be interpreted in another way, so that other signatures of PPI networks can also be characterized for PPI predictions. This alternative interpretation uncovers candidate PPIs that the current L3-based link predictors may not be able to fully capture, underutilizing the L3 principle. RESULTS: In this article, we propose a formulation of link predictors that we call NormalizedL3 (L3N) which addresses certain missing elements within L3 predictors in the perspective of network modeling. Our computational validations show that the L3N predictors are able to find missing PPIs more accurately (in terms of true positives among the predicted PPIs) than the previously proposed methods on several datasets from the literature, including BioGRID, STRING, MINT, and HuRI, at the cost of using more computation time in some of the cases. In addition, we found that L3-based link predictors (including L3N) ranked a different pool of PPIs higher than the general-purpose link predictors did. This suggests that different types of PPIs can be predicted based on different topological assumptions, and that even better PPI link predictors may be obtained in the future by improved network modeling.

Asunto(s)

Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Genómica

20.

Decision-making under uncertainty for species introductions into ecological networks.

Van Kleunen, Lucy B; Peterson, Katie A; Hayden, Meghan T; Keyes, Aislyn; Schwartz, Aaron J; Li, Henry; Dee, Laura E.

Ecol Lett ; 26(6): 983-1004, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37038276

RESUMEN

Ecological communities are increasingly subject to natural and human-induced additions of species, as species shift their ranges under climate change, are introduced for conservation and are unintentionally moved by humans. As such, decisions about how to manage ecosystems subject to species introductions and considering multiple management objectives need to be made. However, the impacts of gaining new species on ecological communities are difficult to predict due to uncertainty in introduced species characteristics, the novel interactions that will be produced by that species, and the recipient ecosystem structure. Drawing on ecological and conservation decision theory, we synthesise literature into a conceptual framework for species introduction decision-making based on ecological networks in high-uncertainty contexts. We demonstrate the application of this framework to a theoretical decision surrounding assisted migration considering both biodiversity and ecosystem service objectives. We show that this framework can be used to evaluate trade-offs between outcomes, predict worst-case scenarios, suggest when one should collect additional data, and allow for improving knowledge of the system over time.

Asunto(s)

Conservación de los Recursos Naturales , Ecosistema , Humanos , Incertidumbre , Biodiversidad , Especies Introducidas

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA