Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.836
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38811359

RESUMEN

The development of deep learning models plays a crucial role in advancing precision medicine. These models enable personalized medical treatments and interventions based on the unique genetic, environmental and lifestyle factors of individual patients, and the promotion of precision medicine is achieved mainly through genomic data analysis, variant annotation and interpretation, pharmacogenomics research, biomarker discovery, disease typing, clinical decision support and disease mechanism interpretation. Extensive research has been conducted to address precision medicine challenges using attention mechanism models such as SAN, GAT and transformers. Especially, the recent popularity of ChatGPT has significantly propelled the application of this model type to a new height. Therefore, I propose a Special Issue for Briefings in Bioinformatics about the topic 'Attention Mechanism Models for Precision Medicine'. This Special Issue aims to provide a comprehensive overview and presentation of innovative researches on the application of graph attention mechanism models in precision medicine.


Asunto(s)
Medicina de Precisión , Medicina de Precisión/métodos , Humanos , Aprendizaje Profundo , Biología Computacional/métodos , Genómica/métodos
2.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39060167

RESUMEN

Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.


Asunto(s)
RNA-Seq , Análisis de Expresión Génica de una Sola Célula , Humanos , Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , RNA-Seq/métodos , Programas Informáticos
3.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38990514

RESUMEN

Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.


Asunto(s)
Péptidos , Sitios de Unión , Péptidos/química , Péptidos/metabolismo , Unión Proteica , Biología Computacional/métodos , Algoritmos , Proteínas/química , Proteínas/metabolismo , Aprendizaje Automático
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38851298

RESUMEN

Deletion is a crucial type of genomic structural variation and is associated with numerous genetic diseases. The advent of third-generation sequencing technology has facilitated the analysis of complex genomic structures and the elucidation of the mechanisms underlying phenotypic changes and disease onset due to genomic variants. Importantly, it has introduced innovative perspectives for deletion variants calling. Here we propose a method named Dual Attention Structural Variation (DASV) to analyze deletion structural variations in sequencing data. DASV converts gene alignment information into images and integrates them with genomic sequencing data through a dual attention mechanism. Subsequently, it employs a multi-scale network to precisely identify deletion regions. Compared with four widely used genome structural variation calling tools: cuteSV, SVIM, Sniffles and PBSV, the results demonstrate that DASV consistently achieves a balance between precision and recall, enhancing the F1 score across various datasets. The source code is available at https://github.com/deconvolution-w/DASV.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Eliminación de Secuencia , Análisis de Secuencia de ADN/métodos , Algoritmos , Genómica/métodos , Biología Computacional/métodos
5.
Brief Bioinform ; 25(6)2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39401144

RESUMEN

Spatial transcriptomics reveals the spatial distribution of genes in complex tissues, providing crucial insights into biological processes, disease mechanisms, and drug development. The prediction of gene expression based on cost-effective histology images is a promising yet challenging field of research. Existing methods for gene prediction from histology images exhibit two major limitations. First, they ignore the intricate relationship between cell morphological information and gene expression. Second, these methods do not fully utilize the different latent stages of features extracted from the images. To address these limitations, we propose a novel hypergraph neural network model, HGGEP, to predict gene expressions from histology images. HGGEP includes a gradient enhancement module to enhance the model's perception of cell morphological information. A lightweight backbone network extracts multiple latent stage features from the image, followed by attention mechanisms to refine the representation of features at each latent stage and capture their relations with nearby features. To explore higher-order associations among multiple latent stage features, we stack them and feed into the hypergraph to establish associations among features at different scales. Experimental results on multiple datasets from disease samples including cancers and tumor disease, demonstrate the superior performance of our HGGEP model than existing methods.


Asunto(s)
Redes Neurales de la Computación , Humanos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Algoritmos , Neoplasias/genética , Neoplasias/patología , Procesamiento de Imagen Asistido por Computador/métodos
6.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678587

RESUMEN

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.


Asunto(s)
Biomarcadores de Tumor , Aprendizaje Profundo , Recurrencia Local de Neoplasia , Humanos , Biomarcadores de Tumor/metabolismo , Biomarcadores de Tumor/genética , Recurrencia Local de Neoplasia/metabolismo , Recurrencia Local de Neoplasia/genética , Biología Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patología , Genómica/métodos , Multiómica
7.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38385876

RESUMEN

Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.


Asunto(s)
Benchmarking , Medicina , Redes Neurales de la Computación , Nucleótidos , Secuencias Reguladoras de Ácidos Nucleicos
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279651

RESUMEN

Rare antinuclear antibody (ANA) pattern recognition has been a widely applied technology for routine ANA screening in clinical laboratories. In recent years, the application of deep learning methods in recognizing ANA patterns has witnessed remarkable advancements. However, the majority of studies in this field have primarily focused on the classification of the most common ANA patterns, while another subset has concentrated on the detection of mitotic metaphase cells. To date, no prior research has been specifically dedicated to the identification of rare ANA patterns. In the present paper, we introduce a novel attention-based enhancement framework, which was designed for the recognition of rare ANA patterns in ANA-indirect immunofluorescence images. More specifically, we selected the algorithm with the best performance as our target detection network by conducting comparative experiments. We then further developed and enhanced the chosen algorithm through a series of optimizations. Then, attention mechanism was introduced to facilitate neural networks in expediting the learning process, extracting more essential and distinctive features for the target features that belong to the specific patterns. The proposed approach has helped to obtained high precision rate of 86.40%, 82.75% recall, 84.24% F1 score and 84.64% mean average precision for a 9-category rare ANA pattern detection task on our dataset. Finally, we evaluated the potential of the model as medical technologist assistant and observed that the technologist's performance improved after referring to the results of the model prediction. These promising results highlighted its potential as an efficient and reliable tool to assist medical technologists in their clinical practice.


Asunto(s)
Algoritmos , Anticuerpos Antinucleares , Técnica del Anticuerpo Fluorescente Indirecta/métodos
9.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39256196

RESUMEN

Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$\%$, thereby significantly accelerating the process of drug-like peptide generation.


Asunto(s)
Péptidos , Péptidos/química , Secuencia de Aminoácidos , Descubrimiento de Drogas , Diseño de Fármacos , Algoritmos , Diseño Asistido por Computadora , Humanos
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581419

RESUMEN

Piwi-interacting RNAs (piRNAs) play a crucial role in various biological processes and are implicated in disease. Consequently, there is an escalating demand for computational tools to predict piRNA-disease interactions. Although there have been computational methods proposed for the detection of piRNA-disease associations, the problem of imbalanced and sparse dataset has brought great challenges to capture the complex relationships between piRNAs and diseases. In response to this necessity, we have developed a novel computational architecture, denoted as PUTransGCN, which uses heterogeneous graph convolutional networks to uncover potential piRNA-disease associations. Additionally, the attention mechanism was used to adjust the weight parameters of aggregation heterogeneous node features automatically. For tackling the imbalanced dataset problem, the combined positive unlabelled learning (PUL) method comprising PU bagging, two-step and spy technique was applied to select reliable negative associations. The features of piRNAs and diseases were derived from three distinct biological sources by PUTransGCN, including information on piRNA sequences, semantic terms related to diseases and the existing network of piRNA-disease associations. In the experiment, PUTransGCN performs in 5-fold cross-validation with an AUC of 0.93 and 0.95 on two datasets, respectively, which outperforms the other six state-of-the-art models. We compared three different PUL methods, and the results of the ablation experiment indicate that the combined PUL method yields the best results. The PUTransGCN could serve as a valuable piRNA-disease prediction tool for upcoming studies in the biomedical field. The code for PUTransGCN is available at https://github.com/chenqiuhao/PUTransGCN.


Asunto(s)
ARN de Interacción con Piwi
11.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426327

RESUMEN

Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.


Asunto(s)
Análisis de Datos , Análisis de Expresión Génica de una Sola Célula , Análisis por Conglomerados , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Algoritmos
12.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38605642

RESUMEN

MicroRNAs (miRNAs) synergize with various biomolecules in human cells resulting in diverse functions in regulating a wide range of biological processes. Predicting potential disease-associated miRNAs as valuable biomarkers contributes to the treatment of human diseases. However, few previous methods take a holistic perspective and only concentrate on isolated miRNA and disease objects, thereby ignoring that human cells are responsible for multiple relationships. In this work, we first constructed a multi-view graph based on the relationships between miRNAs and various biomolecules, and then utilized graph attention neural network to learn the graph topology features of miRNAs and diseases for each view. Next, we added an attention mechanism again, and developed a multi-scale feature fusion module, aiming to determine the optimal fusion results for the multi-view topology features of miRNAs and diseases. In addition, the prior attribute knowledge of miRNAs and diseases was simultaneously added to achieve better prediction results and solve the cold start problem. Finally, the learned miRNA and disease representations were then concatenated and fed into a multi-layer perceptron for end-to-end training and predicting potential miRNA-disease associations. To assess the efficacy of our model (called MUSCLE), we performed 5- and 10-fold cross-validation (CV), which got average the Area under ROC curves of 0.966${\pm }$0.0102 and 0.973${\pm }$0.0135, respectively, outperforming most current state-of-the-art models. We then examined the impact of crucial parameters on prediction performance and performed ablation experiments on the feature combination and model architecture. Furthermore, the case studies about colon cancer, lung cancer and breast cancer also fully demonstrate the good inductive capability of MUSCLE. Our data and code are free available at a public GitHub repository: https://github.com/zht-code/MUSCLE.git.


Asunto(s)
Neoplasias del Colon , Neoplasias Pulmonares , MicroARNs , Humanos , Músculos , Aprendizaje , MicroARNs/genética , Algoritmos , Biología Computacional
13.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38975896

RESUMEN

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.


Asunto(s)
Biología Computacional , Proteínas de Unión al ADN , ADN , Redes Neurales de la Computación , Sitios de Unión , ADN/metabolismo , ADN/química , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/química , Biología Computacional/métodos , Algoritmos , Unión Proteica
14.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36917472

RESUMEN

Identifying the function of DNA sequences accurately is an essential and challenging task in the genomic field. Until now, deep learning has been widely used in the functional analysis of DNA sequences, including DeepSEA, DanQ, DeepATT and TBiNet. However, these methods have the problems of high computational complexity and not fully considering the distant interactions among chromatin features, thus affecting the prediction accuracy. In this work, we propose a hybrid deep neural network model, called DeepFormer, based on convolutional neural network (CNN) and flow-attention mechanism for DNA sequence function prediction. In DeepFormer, the CNN is used to capture the local features of DNA sequences as well as important motifs. Based on the conservation law of flow network, the flow-attention mechanism can capture more distal interactions among sequence features with linear time complexity. We compare DeepFormer with the above four kinds of classical methods using the commonly used dataset of 919 chromatin features of nearly 4.9 million noncoding DNA sequences. Experimental results show that DeepFormer significantly outperforms four kinds of methods, with an average recall rate at least 7.058% higher than other methods. Furthermore, we confirmed the effectiveness of DeepFormer in capturing functional variation using Alzheimer's disease, pathogenic mutations in alpha-thalassemia and modification in CCCTC-binding factor (CTCF) activity. We further predicted the maize chromatin accessibility of five tissues and validated the generalization of DeepFormer. The average recall rate of DeepFormer exceeds the classical methods by at least 1.54%, demonstrating strong robustness.


Asunto(s)
Genómica , Redes Neurales de la Computación , Secuencia de Bases , Genómica/métodos , Cromatina/genética , Genoma
15.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37385595

RESUMEN

Allergies have become an emerging public health problem worldwide. The most effective way to prevent allergies is to find the causative allergen at the source and avoid re-exposure. However, most of the current computational methods used to identify allergens were based on homology or conventional machine learning methods, which were inefficient and still had room to be improved for the detection of allergens with low homology. In addition, few methods based on deep learning were reported, although deep learning has been successfully applied to several tasks in protein sequence analysis. In the present work, a deep neural network-based model, called DeepAlgPro, was proposed to identify allergens. We showed its great accuracy and applicability to large-scale forecasts by comparing it to other available tools. Additionally, we used ablation experiments to demonstrate the critical importance of the convolutional module in our model. Moreover, further analyses showed that epitope features contributed to model decision-making, thus improving the model's interpretability. Finally, we found that DeepAlgPro was capable of detecting potential new allergens. Overall, DeepAlgPro can serve as powerful software for identifying allergens.


Asunto(s)
Aprendizaje Profundo , Hipersensibilidad , Humanos , Alérgenos , Redes Neurales de la Computación , Proteínas/metabolismo
16.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930025

RESUMEN

Drug combination therapy has gradually become a promising treatment strategy for complex or co-existing diseases. As drug-drug interactions (DDIs) may cause unexpected adverse drug reactions, DDI prediction is an important task in pharmacology and clinical applications. Recently, researchers have proposed several deep learning methods to predict DDIs. However, these methods mainly exploit the chemical or biological features of drugs, which is insufficient and limits the performances of DDI prediction. Here, we propose a new deep multimodal feature fusion framework for DDI prediction, DMFDDI, which fuses drug molecular graph, DDI network and the biochemical similarity features of drugs to predict DDIs. To fully extract drug molecular structure, we introduce an attention-gated graph neural network for capturing the global features of the molecular graph and the local features of each atom. A sparse graph convolution network is introduced to learn the topological structure information of the DDI network. In the multimodal feature fusion module, an attention mechanism is used to efficiently fuse different features. To validate the performance of DMFDDI, we compare it with 10 state-of-the-art methods. The comparison results demonstrate that DMFDDI achieves better performance in DDI prediction. Our method DMFDDI is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDEBLab/DMFDDI.git.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Interacciones Farmacológicas , Estructura Molecular , Biblioteca de Genes
17.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38019732

RESUMEN

Drug repositioning, the strategy of redirecting existing drugs to new therapeutic purposes, is pivotal in accelerating drug discovery. While many studies have engaged in modeling complex drug-disease associations, they often overlook the relevance between different node embeddings. Consequently, we propose a novel weighted local information augmented graph neural network model, termed DRAGNN, for drug repositioning. Specifically, DRAGNN firstly incorporates a graph attention mechanism to dynamically allocate attention coefficients to drug and disease heterogeneous nodes, enhancing the effectiveness of target node information collection. To prevent excessive embedding of information in a limited vector space, we omit self-node information aggregation, thereby emphasizing valuable heterogeneous and homogeneous information. Additionally, average pooling in neighbor information aggregation is introduced to enhance local information while maintaining simplicity. A multi-layer perceptron is then employed to generate the final association predictions. The model's effectiveness for drug repositioning is supported by a 10-times 10-fold cross-validation on three benchmark datasets. Further validation is provided through analysis of the predicted associations using multiple authoritative data sources, molecular docking experiments and drug-disease network analysis, laying a solid foundation for future drug discovery.


Asunto(s)
Benchmarking , Reposicionamiento de Medicamentos , Simulación del Acoplamiento Molecular , Descubrimiento de Drogas , Redes Neurales de la Computación
18.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37258453

RESUMEN

Protein is the most important component in organisms and plays an indispensable role in life activities. In recent years, a large number of intelligent methods have been proposed to predict protein function. These methods obtain different types of protein information, including sequence, structure and interaction network. Among them, protein sequences have gained significant attention where methods are investigated to extract the information from different views of features. However, how to fully exploit the views for effective protein sequence analysis remains a challenge. In this regard, we propose a multi-view, multi-scale and multi-attention deep neural model (MMSMA) for protein function prediction. First, MMSMA extracts multi-view features from protein sequences, including one-hot encoding features, evolutionary information features, deep semantic features and overlapping property features based on physiochemistry. Second, a specific multi-scale multi-attention deep network model (MSMA) is built for each view to realize the deep feature learning and preliminary classification. In MSMA, both multi-scale local patterns and long-range dependence from protein sequences can be captured. Third, a multi-view adaptive decision mechanism is developed to make a comprehensive decision based on the classification results of all the views. To further improve the prediction performance, an extended version of MMSMA, MMSMAPlus, is proposed to integrate homology-based protein prediction under the framework of multi-view deep neural model. Experimental results show that the MMSMAPlus has promising performance and is significantly superior to the state-of-the-art methods. The source code can be found at https://github.com/wzy-2020/MMSMAPlus.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Secuencia de Aminoácidos , Programas Informáticos , Análisis de Secuencia de Proteína
19.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36694944

RESUMEN

Protein arginine methylation is an important posttranslational modification (PTM) associated with protein functional diversity and pathological conditions including cancer. Identification of methylation binding sites facilitates a better understanding of the molecular function of proteins. Recent developments in the field of deep neural networks have led to a proliferation of deep learning-based methylation identification studies because of their fast and accurate prediction. In this paper, we propose DeepGpgs, an advanced deep learning model incorporating Gaussian prior and gated attention mechanism. We introduce a residual network channel to extract the evolutionary information of proteins. Then we combine the adaptive embedding with bidirectional long short-term memory networks to form a context-shared encoder layer. A gated multi-head attention mechanism is followed to obtain the global information about the sequence. A Gaussian prior is injected into the sequence to assist in predicting PTMs. We also propose a weighted joint loss function to alleviate the false negative problem. We empirically show that DeepGpgs improves Matthews correlation coefficient by 6.3% on the arginine methylation independent test set compared with the existing state-of-the-art methylation site prediction methods. Furthermore, DeepGpgs has good robustness in phosphorylation site prediction of SARS-CoV-2, which indicates that DeepGpgs has good transferability and the potential to be extended to other modification sites prediction. The open-source code and data of the DeepGpgs can be obtained from https://github.com/saizhou1/DeepGpgs.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Humanos , Metilación , Arginina/metabolismo , SARS-CoV-2/metabolismo , Proteínas/metabolismo
20.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36736352

RESUMEN

Great improvement has been brought to protein tertiary structure prediction through deep learning. It is important but very challenging to accurately rank and score decoy structures predicted by different models. CASP14 results show that existing quality assessment (QA) approaches lag behind the development of protein structure prediction methods, where almost all existing QA models degrade in accuracy when the target is a decoy of high quality. How to give an accurate assessment to high-accuracy decoys is particularly useful with the available of accurate structure prediction methods. Here we propose a fast and effective single-model QA method, QATEN, which can evaluate decoys only by their topological characteristics and atomic types. Our model uses graph neural networks and attention mechanisms to evaluate global and amino acid level scores, and uses specific loss functions to constrain the network to focus more on high-precision decoys and protein domains. On the CASP14 evaluation decoys, QATEN performs better than other QA models under all correlation coefficients when targeting average LDDT. QATEN shows promising performance when considering only high-accuracy decoys. Compared to the embedded evaluation modules of predicted ${C}_{\alpha^{-}} RMSD$ (pRMSD) in RosettaFold and predicted LDDT (pLDDT) in AlphaFold2, QATEN is complementary and capable of achieving better evaluation on some decoy structures generated by AlphaFold2 and RosettaFold. These results suggest that the new QATEN approach can be used as a reliable independent assessment algorithm for high-accuracy protein structure decoys.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Proteínas/química , Algoritmos , Aminoácidos , Dominios Proteicos , Conformación Proteica , Biología Computacional/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA