Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 614
Filtrar
1.
Proc Natl Acad Sci U S A ; 121(8): e2309504121, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38346190

RESUMEN

Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of "transductive" double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets.

2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38349060

RESUMEN

The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Benchmarking , Biología Computacional , Redes Neurales de la Computación
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38343326

RESUMEN

Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.


Asunto(s)
Metagenoma , Microbiota , Microbiota/genética , Benchmarking
4.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39082651

RESUMEN

Constructing accurate gene regulatory network s (GRNs), which reflect the dynamic governing process between genes, is critical to understanding the diverse cellular process and unveiling the complexities in biological systems. With the development of computer sciences, computational-based approaches have been applied to the GRNs inference task. However, current methodologies face challenges in effectively utilizing existing topological information and prior knowledge of gene regulatory relationships, hindering the comprehensive understanding and accurate reconstruction of GRNs. In response, we propose a novel graph neural network (GNN)-based Multi-Task Learning framework for GRN reconstruction, namely MTLGRN. Specifically, we first encode the gene promoter sequences and the gene biological features and concatenate the corresponding feature representations. Then, we construct a multi-task learning framework including GRN reconstruction, Gene knockout predict, and Gene expression matrix reconstruction. With joint training, MTLGRN can optimize the gene latent representations by integrating gene knockout information, promoter characteristics, and other biological attributes. Extensive experimental results demonstrate superior performance compared with state-of-the-art baselines on the GRN reconstruction task, efficiently leveraging biological knowledge and comprehensively understanding the gene regulatory relationships. MTLGRN also pioneered attempts to simulate gene knockouts on bulk data by incorporating gene knockout information.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Biología Computacional/métodos , Técnicas de Inactivación de Genes , Redes Neurales de la Computación , Humanos , Regiones Promotoras Genéticas , Algoritmos
5.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38920345

RESUMEN

Bioactive peptide therapeutics has been a long-standing research topic. Notably, the antimicrobial peptides (AMPs) have been extensively studied for its therapeutic potential. Meanwhile, the demand for annotating other therapeutic peptides, such as antiviral peptides (AVPs) and anticancer peptides (ACPs), also witnessed an increase in recent years. However, we conceive that the structure of peptide chains and the intrinsic information between the amino acids is not fully investigated among the existing protocols. Therefore, we develop a new graph deep learning model, namely TP-LMMSG, which offers lightweight and easy-to-deploy advantages while improving the annotation performance in a generalizable manner. The results indicate that our model can accurately predict the properties of different peptides. The model surpasses the other state-of-the-art models on AMP, AVP and ACP prediction across multiple experimental validated datasets. Moreover, TP-LMMSG also addresses the challenges of time-consuming pre-processing in graph neural network frameworks. With its flexibility in integrating heterogeneous peptide features, our model can provide substantial impacts on the screening and discovery of therapeutic peptides. The source code is available at https://github.com/NanjunChen37/TP_LMMSG.


Asunto(s)
Aminoácidos , Redes Neurales de la Computación , Péptidos , Aminoácidos/química , Péptidos/química , Biología Computacional/métodos , Aprendizaje Profundo , Péptidos Antimicrobianos/química , Algoritmos
6.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38670157

RESUMEN

The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Neoplasias , Redes Neurales de la Computación , Humanos , Neoplasias/genética , Biología Computacional/métodos , Aprendizaje Profundo , Genómica/métodos , Multiómica
7.
Brief Bioinform ; 25(6)2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39331017

RESUMEN

In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$\%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.


Asunto(s)
Redes Neurales de la Computación , Aprendizaje Automático , Algoritmos , Biología Computacional/métodos
8.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38706318

RESUMEN

Molecular property prediction faces the challenge of limited labeled data as it necessitates a series of specialized experiments to annotate target molecules. Data augmentation techniques can effectively address the issue of data scarcity. In recent years, Mixup has achieved significant success in traditional domains such as image processing. However, its application in molecular property prediction is relatively limited due to the irregular, non-Euclidean nature of graphs and the fact that minor variations in molecular structures can lead to alterations in their properties. To address these challenges, we propose a novel data augmentation method called Mix-Key tailored for molecular property prediction. Mix-Key aims to capture crucial features of molecular graphs, focusing separately on the molecular scaffolds and functional groups. By generating isomers that are relatively invariant to the scaffolds or functional groups, we effectively preserve the core information of molecules. Additionally, to capture interactive information between the scaffolds and functional groups while ensuring correlation between the original and augmented graphs, we introduce molecular fingerprint similarity and node similarity. Through these steps, Mix-Key determines the mixup ratio between the original graph and two isomers, thus generating more informative augmented molecular graphs. We extensively validate our approach on molecular datasets of different scales with several Graph Neural Network architectures. The results demonstrate that Mix-Key consistently outperforms other data augmentation methods in enhancing molecular property prediction on several datasets.


Asunto(s)
Algoritmos , Estructura Molecular , Biología Computacional/métodos , Programas Informáticos
9.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38546326

RESUMEN

Chimeric antigen receptor T-cell (CAR-T) immunotherapy, a novel approach for treating blood cancer, is associated with the production of cytokine release syndrome (CRS), which poses significant safety concerns for patients. Currently, there is limited knowledge regarding CRS-related cytokines and the intricate relationship between cytokines and cells. Therefore, it is imperative to explore a reliable and efficient computational method to identify cytokines associated with CRS. In this study, we propose Meta-DHGNN, a directed and heterogeneous graph neural network analysis method based on meta-learning. The proposed method integrates both directed and heterogeneous algorithms, while the meta-learning module effectively addresses the issue of limited data availability. This approach enables comprehensive analysis of the cytokine network and accurate prediction of CRS-related cytokines. Firstly, to tackle the challenge posed by small datasets, a pre-training phase is conducted using the meta-learning module. Consequently, the directed algorithm constructs an adjacency matrix that accurately captures potential relationships in a more realistic manner. Ultimately, the heterogeneous algorithm employs meta-photographs and multi-head attention mechanisms to enhance the realism and accuracy of predicting cytokine information associated with positive labels. Our experimental verification on the dataset demonstrates that Meta-DHGNN achieves favorable outcomes. Furthermore, based on the predicted results, we have explored the multifaceted formation mechanism of CRS in CAR-T therapy from various perspectives and identified several cytokines, such as IFNG (IFN-γ), IFNA1, IFNB1, IFNA13, IFNA2, IFNAR1, IFNAR2, IFNGR1 and IFNGR2 that have been relatively overlooked in previous studies but potentially play pivotal roles. The significance of Meta-DHGNN lies in its ability to analyze directed and heterogeneous networks in biology effectively while also facilitating CRS risk prediction in CAR-T therapy.


Asunto(s)
Citocinas , Receptores Quiméricos de Antígenos , Humanos , Síndrome de Liberación de Citoquinas , Receptores Quiméricos de Antígenos/genética , Aprendizaje , Redes Neurales de la Computación , Interferón-alfa
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581422

RESUMEN

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.


Asunto(s)
Cromatina , Redes Neurales de la Computación , Reproducibilidad de los Resultados
11.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39175133

RESUMEN

Target identification is one of the crucial tasks in drug research and development, as it aids in uncovering the action mechanism of herbs/drugs and discovering new therapeutic targets. Although multiple algorithms of herb target prediction have been proposed, due to the incompleteness of clinical knowledge and the limitation of unsupervised models, accurate identification for herb targets still faces huge challenges of data and models. To address this, we proposed a deep learning-based target prediction framework termed HTINet2, which designed three key modules, namely, traditional Chinese medicine (TCM) and clinical knowledge graph embedding, residual graph representation learning, and supervised target prediction. In the first module, we constructed a large-scale knowledge graph that covers the TCM properties and clinical treatment knowledge of herbs, and designed a component of deep knowledge embedding to learn the deep knowledge embedding of herbs and targets. In the remaining two modules, we designed a residual-like graph convolution network to capture the deep interactions among herbs and targets, and a Bayesian personalized ranking loss to conduct supervised training and target prediction. Finally, we designed comprehensive experiments, of which comparison with baselines indicated the excellent performance of HTINet2 (HR@10 increased by 122.7% and NDCG@10 by 35.7%), ablation experiments illustrated the positive effect of our designed modules of HTINet2, and case study demonstrated the reliability of the predicted targets of Artemisia annua and Coptis chinensis based on the knowledge base, literature, and molecular docking.


Asunto(s)
Medicamentos Herbarios Chinos , Medicina Tradicional China , Redes Neurales de la Computación , Medicamentos Herbarios Chinos/química , Medicamentos Herbarios Chinos/farmacología , Algoritmos , Humanos , Aprendizaje Profundo , Teorema de Bayes
12.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39210506

RESUMEN

Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on graph neural networks (GNN) are effective in identifying cancer genes, they fall short in effectively integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multiview data, offering significant insights into the identification of cancer genes. Experimental results demonstrate that IMVRL-GCN outperforms state-of-the-art cancer gene identification methods and several baselines. Furthermore, IMVRL-GCN is employed to identify a total of 74 high-confidence novel cancer genes, and multiview data analysis highlights the pivotal roles of shared, mutation-specific, and structure-specific representations in discriminating distinctive cancer genes. Exploration of the mechanisms behind their discriminative capabilities suggests that shared representations are strongly associated with gene functions, while mutation-specific and structure-specific representations are linked to mutagenic propensity and functional synergy, respectively. Finally, our in-depth analyses of these candidates suggest potential insights for individualized treatments: afatinib could counteract many mutation-driven risks, and targeting interactions with cancer gene SRC is a reasonable strategy to mitigate interaction-induced risks for NR3C1, RXRA, HNF4A, and SP1.


Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Biología Computacional/métodos , Redes Neurales de la Computación , Mutación , Genes Relacionados con las Neoplasias , Factor Nuclear 4 del Hepatocito/genética , Aprendizaje Automático
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581420

RESUMEN

Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.


Asunto(s)
Aprendizaje Profundo , Proteínas , Proteínas/química , Unión Proteica , Ligandos , Diseño de Fármacos
14.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38754409

RESUMEN

Drug repurposing offers a viable strategy for discovering new drugs and therapeutic targets through the analysis of drug-gene interactions. However, traditional experimental methods are plagued by their costliness and inefficiency. Despite graph convolutional network (GCN)-based models' state-of-the-art performance in prediction, their reliance on supervised learning makes them vulnerable to data sparsity, a common challenge in drug discovery, further complicating model development. In this study, we propose SGCLDGA, a novel computational model leveraging graph neural networks and contrastive learning to predict unknown drug-gene associations. SGCLDGA employs GCNs to extract vector representations of drugs and genes from the original bipartite graph. Subsequently, singular value decomposition (SVD) is employed to enhance the graph and generate multiple views. The model performs contrastive learning across these views, optimizing vector representations through a contrastive loss function to better distinguish positive and negative samples. The final step involves utilizing inner product calculations to determine association scores between drugs and genes. Experimental results on the DGIdb4.0 dataset demonstrate SGCLDGA's superior performance compared with six state-of-the-art methods. Ablation studies and case analyses validate the significance of contrastive learning and SVD, highlighting SGCLDGA's potential in discovering new drug-gene associations. The code and dataset for SGCLDGA are freely available at https://github.com/one-melon/SGCLDGA.


Asunto(s)
Redes Neurales de la Computación , Humanos , Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , Algoritmos , Programas Informáticos , Descubrimiento de Drogas/métodos , Aprendizaje Automático
15.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38975895

RESUMEN

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Algoritmos , Humanos , Animales , Programas Informáticos , Aprendizaje Automático
16.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38920341

RESUMEN

Drug-target interactions (DTIs) are a key part of drug development process and their accurate and efficient prediction can significantly boost development efficiency and reduce development time. Recent years have witnessed the rapid advancement of deep learning, resulting in an abundance of deep learning-based models for DTI prediction. However, most of these models used a single representation of drugs and proteins, making it difficult to comprehensively represent their characteristics. Multimodal data fusion can effectively compensate for the limitations of single-modal data. However, existing multimodal models for DTI prediction do not take into account both intra- and inter-modal interactions simultaneously, resulting in limited presentation capabilities of fused features and a reduction in DTI prediction accuracy. A hierarchical multimodal self-attention-based graph neural network for DTI prediction, called HMSA-DTI, is proposed to address multimodal feature fusion. Our proposed HMSA-DTI takes drug SMILES, drug molecular graphs, protein sequences and protein 2-mer sequences as inputs, and utilizes a hierarchical multimodal self-attention mechanism to achieve deep fusion of multimodal features of drugs and proteins, enabling the capture of intra- and inter-modal interactions between drugs and proteins. It is demonstrated that our proposed HMSA-DTI has significant advantages over other baseline methods on multiple evaluation metrics across five benchmark datasets.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Proteínas/química , Proteínas/metabolismo , Humanos , Algoritmos , Biología Computacional/métodos
17.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39162311

RESUMEN

The prediction of metabolite-protein interactions (MPIs) plays an important role in plant basic life functions. Compared with the traditional experimental methods and the high-throughput genomics methods using statistical correlation, applying heterogeneous graph neural networks to the prediction of MPIs in plants can reduce the cost of manpower, resources, and time. However, to the best of our knowledge, applying heterogeneous graph neural networks to the prediction of MPIs in plants still remains under-explored. In this work, we propose a novel model named heterogeneous neighbor contrastive graph attention network (HNCGAT), for the prediction of MPIs in Arabidopsis. The HNCGAT employs the type-specific attention-based neighborhood aggregation mechanism to learn node embeddings of proteins, metabolites, and functional-annotations, and designs a novel heterogeneous neighbor contrastive learning framework to preserve heterogeneous network topological structures. Extensive experimental results and ablation study demonstrate the effectiveness of the HNCGAT model for MPI prediction. In addition, a case study on our MPI prediction results supports that the HNCGAT model can effectively predict the potential MPIs in plant.


Asunto(s)
Arabidopsis , Redes Neurales de la Computación , Arabidopsis/genética , Arabidopsis/metabolismo , Algoritmos , Biología Computacional/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo
18.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39226887

RESUMEN

Plasma protein biomarkers have been considered promising tools for diagnosing dementia subtypes due to their low variability, cost-effectiveness, and minimal invasiveness in diagnostic procedures. Machine learning (ML) methods have been applied to enhance accuracy of the biomarker discovery. However, previous ML-based studies often overlook interactions between proteins, which are crucial in complex disorders like dementia. While protein-protein interactions (PPIs) have been used in network models, these models often fail to fully capture the diverse properties of PPIs due to their local awareness. This drawback increases the chance of neglecting critical components and magnifying the impact of noisy interactions. In this study, we propose a novel graph-based ML model for dementia subtype diagnosis, the graph propagational network (GPN). By propagating the independent effect of plasma proteins on PPI network, the GPN extracts the globally interactive effects between proteins. Experimental results showed that the interactive effect between proteins yielded to further clarify the differences between dementia subtype groups and contributed to the performance improvement where the GPN outperformed existing methods by 10.4% on average.


Asunto(s)
Biomarcadores , Proteínas Sanguíneas , Demencia , Aprendizaje Automático , Mapas de Interacción de Proteínas , Humanos , Demencia/metabolismo , Demencia/diagnóstico , Proteínas Sanguíneas/metabolismo , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Biología Computacional/métodos
19.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38975896

RESUMEN

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.


Asunto(s)
Biología Computacional , Proteínas de Unión al ADN , ADN , Redes Neurales de la Computación , Sitios de Unión , ADN/metabolismo , ADN/química , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/química , Biología Computacional/métodos , Algoritmos , Unión Proteica
20.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37317619

RESUMEN

The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.


Asunto(s)
Algoritmos , Proteínas , Conformación Proteica , Bases de Datos de Proteínas , Proteínas/química , Redes Neurales de la Computación
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda