Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 192
Filtrar
Más filtros

País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 121(8): e2312527121, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38363864

RESUMEN

Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.

2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38605638

RESUMEN

Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.


Asunto(s)
Comunicación Celular , Transducción de Señal , Humanos , Animales , Ratones , Comunicación Celular/genética , Aprendizaje , Redes Neurales de la Computación , Expresión Génica
3.
Proc Natl Acad Sci U S A ; 120(50): e2303887120, 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38060555

RESUMEN

Complex networked systems often exhibit higher-order interactions, beyond dyadic interactions, which can dramatically alter their observed behavior. Consequently, understanding hypergraphs from a structural perspective has become increasingly important. Statistical, group-based inference approaches are well suited for unveiling the underlying community structure and predicting unobserved interactions. However, these approaches often rely on two key assumptions: that the same groups can explain hyperedges of any order and that interactions are assortative, meaning that edges are formed by nodes with the same group memberships. To test these assumptions, we propose a group-based generative model for hypergraphs that does not impose an assortative mechanism to explain observed higher-order interactions, unlike current approaches. Our model allows us to explore the validity of the assumptions. Our results indicate that the first assumption appears to hold true for real networks. However, the second assumption is not necessarily accurate; we find that a combination of general statistical mechanisms can explain observed hyperedges. Finally, with our approach, we are also able to determine the importance of lower and high-order interactions for predicting unobserved interactions. Our research challenges the conventional assumptions of group-based inference methodologies and broadens our understanding of the underlying structure of hypergraphs.

4.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37507113

RESUMEN

Drug-drug interaction (DDI) identification is essential to clinical medicine and drug discovery. The two categories of drugs (i.e. chemical drugs and biotech drugs) differ remarkably in molecular properties, action mechanisms, etc. Biotech drugs are up-to-comers but highly promising in modern medicine due to higher specificity and fewer side effects. However, existing DDI prediction methods only consider chemical drugs of small molecules, not biotech drugs of large molecules. Here, we build a large-scale dual-modal graph database named CB-DB and customize a graph-based framework named CB-TIP to reason event-aware DDIs for both chemical and biotech drugs. CB-DB comprehensively integrates various interaction events and two heterogeneous kinds of molecular structures. It imports endogenous proteins founded on the fact that most drugs take effects by interacting with endogenous proteins. In the modality of molecular structure, drugs and endogenous proteins are two heterogeneous kinds of graphs, while in the modality of interaction, they are nodes connected by events (i.e. edges of different relationships). CB-TIP employs graph representation learning methods to generate drug representations from either modality and then contrastively mixes them to predict how likely an event occurs when a drug meets another in an end-to-end manner. Experiments demonstrate CB-TIP's great superiority in DDI prediction and the promising potential of uncovering novel DDIs.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Interacciones Farmacológicas , Descubrimiento de Drogas , Estructura Molecular , Proteínas
5.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37985457

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.


Asunto(s)
Regulación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Reproducibilidad de los Resultados , Redes Neurales de la Computación , Redes Reguladoras de Genes , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos
6.
Methods ; 222: 51-56, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38184219

RESUMEN

The interaction between human microbes and drugs can significantly impact human physiological functions. It is crucial to identify potential microbe-drug associations (MDAs) before drug administration. However, conventional biological experiments to predict MDAs are plagued by drawbacks such as time-consuming, high costs, and potential risks. On the contrary, computational approaches can speed up the screening of MDAs at a low cost. Most computational models usually use a drug similarity matrix as the initial feature representation of drugs and stack the graph neural network layers to extract the features of network nodes. However, different calculation methods result in distinct similarity matrices, and message passing in graph neural networks (GNNs) induces phenomena of over-smoothing and over-squashing, thereby impacting the performance of the model. To address these issues, we proposed a novel graph representation learning model, dual-modal graph learning for microbe-drug association prediction (DMGL-MDA). It comprises a dual-modal embedding module, a bipartite graph network embedding module, and a predictor module. To assess the performance of DMGL-MDA, we compared it against state-of-the-art methods using two benchmark datasets. Through cross-validation, we illustrated the superiority of DMGL-MDA. Furthermore, we conducted ablation experiments and case studies to validate the effective performance of the model.


Asunto(s)
Benchmarking , Redes Neurales de la Computación , Humanos , Proyectos de Investigación
7.
Methods ; 224: 79-92, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38430967

RESUMEN

The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.


Asunto(s)
Descubrimiento de Drogas , Laboratorios , Interacciones Farmacológicas
8.
BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38872097

RESUMEN

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .


Asunto(s)
Benchmarking , Benchmarking/métodos , Algoritmos , Investigación Biomédica/métodos , Programas Informáticos , Aprendizaje Automático , Bases de Datos Factuales , Biología Computacional/métodos , Semántica
9.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35595715

RESUMEN

Prokaryotic viruses, which infect bacteria and archaea, are key players in microbial communities. Predicting the hosts of prokaryotic viruses helps decipher the dynamic relationship between microbes. Experimental methods for host prediction cannot keep pace with the fast accumulation of sequenced phages. Thus, there is a need for computational host prediction. Despite some promising results, computational host prediction remains a challenge because of the limited known interactions and the sheer amount of sequenced phages by high-throughput sequencing technologies. The state-of-the-art methods can only achieve 43% accuracy at the species level. In this work, we formulate host prediction as link prediction in a knowledge graph that integrates multiple protein and DNA-based sequence features. Our implementation named CHERRY can be applied to predict hosts for newly discovered viruses and to identify viruses infecting targeted bacteria. We demonstrated the utility of CHERRY for both applications and compared its performance with 11 popular host prediction methods. To our best knowledge, CHERRY has the highest accuracy in identifying virus-prokaryote interactions. It outperforms all the existing methods at the species level with an accuracy increase of 37%. In addition, CHERRY's performance on short contigs is more stable than other tools.


Asunto(s)
Bacteriófagos , Virus , Bacterias , Bacteriófagos/genética , ADN , Células Procariotas , Virus/genética
10.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36384050

RESUMEN

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.


Asunto(s)
Descubrimiento de Drogas , Reconocimiento de Normas Patrones Automatizadas , Conocimiento , Aprendizaje Automático , Programas Informáticos
11.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35380622

RESUMEN

Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.


Asunto(s)
Desarrollo de Medicamentos , Redes Neurales de la Computación , Desarrollo de Medicamentos/métodos , Interacciones Farmacológicas , Reposicionamiento de Medicamentos , Polímeros
12.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34864871

RESUMEN

Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.


Asunto(s)
Análisis de Datos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
13.
J Theor Biol ; 589: 111850, 2024 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-38740126

RESUMEN

Protein-protein interactions (PPIs) are crucial for various biological processes, and predicting PPIs is a major challenge. To solve this issue, the most common method is link prediction. Currently, the link prediction methods based on network Paths of Length Three (L3) have been proven to be highly effective. In this paper, we propose a novel link prediction algorithm, named SMS, which is based on L3 and protein similarities. We first design a mixed similarity that combines the topological structure and attribute features of nodes. Then, we compute the predicted value by summing the product of all similarities along the L3. Furthermore, we propose the Max Similarity Multiplied Similarity (maxSMS) algorithm from the perspective of maximum impact. Our computational prediction results show that on six datasets, including S. cerevisiae, H. sapiens, and others, the maxSMS algorithm improves the precision of the top 500, area under the precision-recall curve, and normalized discounted cumulative gain by an average of 26.99%, 53.67%, and 6.7%, respectively, compared to other optimal methods.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Humanos , Mapeo de Interacción de Proteínas/métodos , Biología Computacional/métodos , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Bases de Datos de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética
14.
J Biomed Inform ; 158: 104725, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39265815

RESUMEN

OBJECTIVE: As new knowledge is produced at a rapid pace in the biomedical field, existing biomedical Knowledge Graphs (KGs) cannot be manually updated in a timely manner. Previous work in Natural Language Processing (NLP) has leveraged link prediction to infer the missing knowledge in general-purpose KGs. Inspired by this, we propose to apply link prediction to existing biomedical KGs to infer missing knowledge. Although Knowledge Graph Embedding (KGE) methods are effective in link prediction tasks, they are less capable of capturing relations between communities of entities with specific attributes (Fanourakis et al., 2023). METHODS: To address this challenge, we proposed an entity distance-based method for abstracting a Community Knowledge Graph (CKG) from a simplified version of the pre-existing PubMed Knowledge Graph (PKG) (Xu et al., 2020). For link prediction on the abstracted CKG, we proposed an extension approach for the existing KGE models by linking the information in the PKG to the abstracted CKG. The applicability of this extension was proved by employing six well-known KGE models: TransE, TransH, DistMult, ComplEx, SimplE, and RotatE. Evaluation metrics including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@k were used to assess the link prediction performance. In addition, we presented a backtracking process that traces the results of CKG link prediction back to the PKG scale for further comparison. RESULTS: Six different CKGs were abstracted from the PKG by using embeddings of the six KGE methods. The results of link prediction in these abstracted CKGs indicate that our proposed extension can improve the existing KGE methods, achieving a top-10 accuracy of 0.69 compared to 0.5 for TransE, 0.7 compared to 0.54 for TransH, 0.67 compared to 0.6 for DistMult, 0.73 compared to 0.57 for ComplEx, 0.73 compared to 0.63 for SimplE, and 0.85 compared to 0.76 for RotatE on their CKGs, respectively. These improved performances also highlight the wide applicability of the extension approach. CONCLUSION: This study proposed novel insights into abstracting CKGs from the PKG. The extension approach indicated enhanced performance of the existing KGE methods and has applicability. As an interesting future extension, we plan to conduct link prediction for entities that are newly introduced to the PKG.


Asunto(s)
Procesamiento de Lenguaje Natural , PubMed , Algoritmos , Humanos , Minería de Datos/métodos , Bases del Conocimiento
15.
J Biomed Inform ; 158: 104730, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39326691

RESUMEN

OBJECTIVE: To develop the FuseLinker, a novel link prediction framework for biomedical knowledge graphs (BKGs), which fully exploits the graph's structural, textual and domain knowledge information. We evaluated the utility of FuseLinker in the graph-based drug repurposing task through detailed case studies. METHODS: FuseLinker leverages fused pre-trained text embedding and domain knowledge embedding to enhance the graph neural network (GNN)-based link prediction model tailored for BKGs. This framework includes three parts: a) obtain text embeddings for BKGs using embedding-visible large language models (LLMs), b) learn the representations of medical ontology as domain knowledge information by employing the Poincaré graph embedding method, and c) fuse these embeddings and further learn the graph structure representations of BKGs by applying a GNN-based link prediction model. We evaluated FuseLinker against traditional knowledge graph embedding models and a conventional GNN-based link prediction model across four public BKG datasets. Additionally, we examined the impact of using different embedding-visible LLMs on FuseLinker's performance. Finally, we investigated FuseLinker's ability to generate medical hypotheses through two drug repurposing case studies for Sorafenib and Parkinson's disease. RESULTS: By comparing FuseLinker with baseline models on four BKGs, our method demonstrates superior performance. The Mean Reciprocal Rank (MRR) and Area Under receiver operating characteristic Curve (AUROC) for KEGG50k, Hetionet, SuppKG and ADInt are 0.969 and 0.987, 0.548 and 0.903, 0.739 and 0.928, and 0.831 and 0.890, respectively. CONCLUSION: Our study demonstrates that FuseLinker is an effective novel link prediction framework that integrates multiple graph information and shows significant potential for practical applications in biomedical and clinical tasks. Source code and data are available at https://github.com/YKXia0/FuseLinker.


Asunto(s)
Redes Neurales de la Computación , Humanos , Reposicionamiento de Medicamentos/métodos , Algoritmos , Procesamiento de Lenguaje Natural , Informática Médica/métodos , Aprendizaje Automático
16.
Sensors (Basel) ; 24(4)2024 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-38400439

RESUMEN

With the continuous development of new wearable devices, sensor-based human activity recognition is enjoying enormous popularity in research and industry. The signals from inertial sensors allow for the detection, classification, and analysis of human activities such as jogging, cycling, or swimming. However, human activity recognition is often limited to basic activities that occur in short, predetermined periods of time (sliding windows). Complex macro-activities, such as multi-step sports exercises or multi-step cooking recipes, are still only considered to a limited extent, while some works have investigated the classification of macro-activities, the automated understanding of how the underlying micro-activities interact remains an open challenge. This study addresses this gap through the application of graph link prediction, a well-known concept in graph theory and graph neural networks (GNNs). To this end, the presented approach transforms micro-activity sequences into micro-activity graphs that are then processed with a GNN. The evaluation on two derived real-world data sets shows that graph link prediction enables the accurate identification of interactions between micro-activities and the precise validation of composite macro-activities based on learned graph embeddings. Furthermore, this work shows that GNNs can benefit from positional encodings in sequence recognition tasks.


Asunto(s)
Ciclismo , Culinaria , Humanos , Terapia por Ejercicio , Industrias , Natación
17.
J Environ Manage ; 370: 122505, 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39293117

RESUMEN

Reducing urban carbon emissions (UCEs) holds paramount importance for global sustainable development. However, the complexity of interactions among urban spatial units has impeded further research on UCEs. This study investigates synergistic emission reduction between cities by analyzing the spatial complexity within the UCEs network. The future potential for synergistic carbon emissions reduction is predicted by the link prediction algorithm. A case study conducted in the Pearl River Basin of China demonstrates that the UCEs network has a complex spatial structure, and the synergistic capacity of emission reduction among cities is enhanced. The core cities in the UCEs network, including Dongguan, Shenzhen, and Guangzhou, have spillover effects that contribute to synergistic emission reduction. Community detection reveals that the common characteristics associated with UCEs become concentrated, thereby enhancing the synergy of joint efforts between cities. The link prediction algorithm indicates a high probability of strengthened carbon emission connections in the Pearl River Delta, alongside those between upstream cities, which shows potential in forecasting synergistic emission reductions. Our research framework offers a comprehensive analysis for synergistic emission reduction from the spatial complexity of UCEs network and link prediction. It acts as a worthwhile reference for developing differentiated policies on synergistic emission reduction.

18.
Entropy (Basel) ; 26(6)2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38920442

RESUMEN

Link prediction plays a crucial role in identifying future connections within complex networks, facilitating the analysis of network evolution across various domains such as biological networks, social networks, recommender systems, and more. Researchers have proposed various centrality measures, such as degree, clustering coefficient, betweenness, and closeness centralities, to compute similarity scores for predicting links in these networks. These centrality measures leverage both the local and global information of nodes within the network. In this study, we present a novel approach to link prediction using similarity score by utilizing average centrality measures based on local and global centralities, namely Similarity based on Average Degree (SACD), Similarity based on Average Betweenness (SACB), Similarity based on Average Closeness (SACC), and Similarity based on Average Clustering Coefficient (SACCC). Our approach involved determining centrality scores for each node, calculating the average centrality for the entire graph, and deriving similarity scores through common neighbors. We then applied centrality scores to these common neighbors and identified nodes with above average centrality. To evaluate our approach, we compared proposed measures with existing local similarity-based link prediction measures, including common neighbors, the Jaccard coefficient, Adamic-Adar, resource allocation, preferential attachment, as well as recent measures like common neighbor and the Centrality-based Parameterized Algorithm (CCPA), and keyword network link prediction (KNLP). We conducted experiments on four real-world datasets. The proposed similarity scores based on average centralities demonstrate significant improvements. We observed an average enhancement of 24% in terms of Area Under the Receiver Operating Characteristic (AUROC) compared to existing local similarity measures, and a 31% improvement over recent measures. Furthermore, we witnessed an average improvement of 49% and 51% in the Area Under Precision-Recall (AUPR) compared to existing and recent measures. Our comprehensive experiments highlight the superior performance of the proposed method.

19.
Entropy (Basel) ; 26(7)2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-39056950

RESUMEN

Graph representation learning aims to map nodes or edges within a graph using low-dimensional vectors, while preserving as much topological information as possible. During past decades, numerous algorithms for graph representation learning have emerged. Among them, proximity matrix representation methods have been shown to exhibit excellent performance in experiments and scale to large graphs with millions of nodes. However, with the rapid development of the Internet, information interactions are happening at the scale of billions every moment. Most methods for similarity matrix factorization still focus on static graphs, leading to incomplete similarity descriptions and low embedding quality. To enhance the embedding quality of temporal graph learning, we propose a temporal graph representation learning model based on the matrix factorization of Time-constrained Personalize PageRank (TPPR) matrices. TPPR, an extension of personalized PageRank (PPR) that incorporates temporal information, better captures node similarities in temporal graphs. Based on this, we use Single Value Decomposition or Nonnegative Matrix Factorization to decompose TPPR matrices to obtain embedding vectors for each node. Through experiments on tasks such as link prediction, node classification, and node clustering across multiple temporal graphs, as well as a comparison with various experimental methods, we find that graph representation learning algorithms based on TPPR matrix factorization achieve overall outstanding scores on multiple temporal datasets, highlighting their effectiveness.

20.
Entropy (Basel) ; 26(6)2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38920486

RESUMEN

Link prediction is recognized as a crucial means to analyze dynamic social networks, revealing the principles of social relationship evolution. However, the complex topology and temporal evolution characteristics of dynamic social networks pose significant research challenges. This study introduces an innovative fusion framework that incorporates entropy, causality, and a GCN model, focusing specifically on link prediction in dynamic social networks. Firstly, the framework preprocesses the raw data, extracting and recording timestamp information between interactions. It then introduces the concept of "Temporal Information Entropy (TIE)", integrating it into the Node2Vec algorithm's random walk to generate initial feature vectors for nodes in the graph. A causality analysis model is subsequently applied for secondary processing of the generated feature vectors. Following this, an equal dataset is constructed by adjusting the ratio of positive and negative samples. Lastly, a dedicated GCN model is used for model training. Through extensive experimentation in multiple real social networks, the framework proposed in this study demonstrated a better performance than other methods in key evaluation indicators such as precision, recall, F1 score, and accuracy. This study provides a fresh perspective for understanding and predicting link dynamics in social networks and has significant practical value.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA