Pesquisa | BVS - MINISTÉRIO DA SAÚDE

RPEMHC: improved prediction of MHC-peptide binding affinity by a deep learning approach based on residue-residue pair encoding.

Wang, Xuejiao; Wu, Tingfang; Jiang, Yelu; Chen, Taoning; Pan, Deng; Jin, Zhi; Xie, Jingxin; Quan, Lijun; Lyu, Qiang.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38175759

RESUMO

MOTIVATION: Binding of peptides to major histocompatibility complex (MHC) molecules plays a crucial role in triggering T cell recognition mechanisms essential for immune response. Accurate prediction of MHC-peptide binding is vital for the development of cancer therapeutic vaccines. While recent deep learning-based methods have achieved significant performance in predicting MHC-peptide binding affinity, most of them separately encode MHC molecules and peptides as inputs, potentially overlooking critical interaction information between the two. RESULTS: In this work, we propose RPEMHC, a new deep learning approach based on residue-residue pair encoding to predict the binding affinity between peptides and MHC, which encode an MHC molecule and a peptide as a residue-residue pair map. We evaluate the performance of RPEMHC on various MHC-II-related datasets for MHC-peptide binding prediction, demonstrating that RPEMHC achieves better or comparable performance against other state-of-the-art baselines. Moreover, we further construct experiments on MHC-I-related datasets, and experimental results demonstrate that our method can work on both two MHC classes. These extensive validations have manifested that RPEMHC is an effective tool for studying MHC-peptide interactions and can potentially facilitate the vaccine development. AVAILABILITY: The source code of the method along with trained models is freely available at https://github.com/lennylv/RPEMHC.

Assuntos

Aprendizado Profundo , Ligação Proteica , Peptídeos/química , Complexo Principal de Histocompatibilidade , Antígenos de Histocompatibilidade Classe I/metabolismo

DGCddG: Deep Graph Convolution for Predicting Protein-Protein Binding Affinity Changes Upon Mutations.

Jiang, Yelu; Quan, Lijun; Li, Kailong; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2089-2100, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37018301

RESUMO

Effectively and accurately predicting the effects of interactions between proteins after amino acid mutations is a key issue for understanding the mechanism of protein function and drug design. In this study, we present a deep graph convolution (DGC) network-based framework, DGCddG, to predict the changes of protein-protein binding affinity after mutation. DGCddG incorporates multi-layer graph convolution to extract a deep, contextualized representation for each residue of the protein complex structure. The mined channels of the mutation sites by DGC is then fitted to the binding affinity with a multi-layer perceptron. Experiments with results on multiple datasets show that our model can achieve relatively good performance for both single and multi-point mutations. For blind tests on datasets related to angiotensin-converting enzyme 2 binding with the SARS-CoV-2 virus, our method shows better results in predicting ACE2 changes, may help in finding favorable antibodies. Code and data availability: https://github.com/lennylv/DGCddG.

Assuntos

COVID-19 , Humanos , Ligação Proteica/genética , COVID-19/genética , SARS-CoV-2/genética , Mutação/genética , Mutação Puntual

Simultaneous Prediction of Interaction Sites on the Protein and Peptide Sides of Complexes through Multilayer Graph Convolutional Networks.

Li, Kailong; Quan, Lijun; Jiang, Yelu; Wu, Hongjie; Wu, Jian; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

J Chem Inf Model ; 63(7): 2251-2262, 2023 04 10.

Artigo em Inglês | MEDLINE | ID: mdl-36989086

RESUMO

Identifying the binding residues of protein-peptide complexes is essential for understanding protein function mechanisms and exploring drug discovery. Recently, many computational methods have been developed to predict the interaction sites of either protein or peptide. However, to our knowledge, no prediction method can simultaneously identify the interaction sites on both the protein and peptide sides. Here, we propose a deep graph convolutional network (GCN)-based method called GraphPPepIS to predict the interaction sites of protein-peptide complexes using protein and peptide structural information. We also propose a companion method, SeqPPepIS, for assisting with the lack of structural information and the flexibility of peptides. SepPPepIS replaces the peptide structural features in GraphPPepIS by learning features from peptide sequences. We performed a comprehensive evaluation of the benchmark data sets, and the results show that our two methods outperform state-of-the-art methods on the accurate interaction sites of both protein and peptide sides. We show that our methods can help improve protein-peptide docking. For docking data sets, our methods maintain robust performance in identifying binding sites, thereby enhancing the prediction of peptide binding poses. Finally, we visualized the analysis of protein and peptide graph embedding to demonstrate the learning ability of graph convolution in predicting interaction sites, which was mainly obtained through the shared parameters of a protein graph and peptide graph.

Assuntos

Benchmarking , Peptídeos , Sequência de Aminoácidos , Sítios de Ligação , Descoberta de Drogas

ctP²ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation.

Li, Kailong; Quan, Lijun; Jiang, Yelu; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 297-306, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35213314

RESUMO

Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.

Assuntos

Redes Neurais de Computação , Software , Benchmarking

DeepNup: Prediction of Nucleosome Positioning from DNA Sequences Using Deep Neural Network.

Zhou, Yiting; Wu, Tingfang; Jiang, Yelu; Li, Yan; Li, Kailong; Quan, Lijun; Lyu, Qiang.

Genes (Basel) ; 13(11)2022 10 30.

Artigo em Inglês | MEDLINE | ID: mdl-36360220

RESUMO

Nucleosome positioning is involved in diverse cellular biological processes by regulating the accessibility of DNA sequences to DNA-binding proteins and plays a vital role. Previous studies have manifested that the intrinsic preference of nucleosomes for DNA sequences may play a dominant role in nucleosome positioning. As a consequence, it is nontrivial to develop computational methods only based on DNA sequence information to accurately identify nucleosome positioning, and thus intend to verify the contribution of DNA sequences responsible for nucleosome positioning. In this work, we propose a new deep learning-based method, named DeepNup, which enables us to improve the prediction of nucleosome positioning only from DNA sequences. Specifically, we first use a hybrid feature encoding scheme that combines One-hot encoding and Trinucleotide composition encoding to encode raw DNA sequences; afterwards, we employ multiscale convolutional neural network modules that consist of two parallel convolution kernels with different sizes and gated recurrent units to effectively learn the local and global correlation feature representations; lastly, we use a fully connected layer and a sigmoid unit serving as a classifier to integrate these learned high-order feature representations and generate the final prediction outcomes. By comparing the experimental evaluation metrics on two benchmark nucleosome positioning datasets, DeepNup achieves a better performance for nucleosome positioning prediction than that of several state-of-the-art methods. These results demonstrate that DeepNup is a powerful deep learning-based tool that enables one to accurately identify potential nucleosome sequences.

Assuntos

Nucleossomos , Saccharomyces cerevisiae , Nucleossomos/genética , Nucleossomos/metabolismo , Sequência de Bases , Saccharomyces cerevisiae/genética , Montagem e Desmontagem da Cromatina , Redes Neurais de Computação

Identifying modifications on DNA-bound histones with joint deep learning of multiple binding sites in DNA sequence.

Li, Yan; Quan, Lijun; Zhou, Yiting; Jiang, Yelu; Li, Kailong; Wu, Tingfang; Lyu, Qiang.

Bioinformatics ; 38(17): 4070-4077, 2022 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-35809058

RESUMO

MOTIVATION: Histone modifications are epigenetic markers that impact gene expression by altering the chromatin structure or recruiting histone modifiers. Their accurate identification is key to unraveling the mechanisms by which they regulate gene expression. However, the solutions for this task can be improved by exploiting multiple relationships from dataset and exploring designs of learning models, for example jointly learning technology. RESULTS: This article proposes a deep learning-based multi-objective computational approach, iHMnBS, to identify which of the seven typical histone modifications a DNA sequence may choose to bind, and which parts of the DNA sequence bind to them. iHMnBS employs a customized dataset that allows the marking of modifications contained in histones that may bind to any position in the DNA sequence. iHMnBS tries to mine the information implicit in this richer data by means of deep neural networks. In comprehensive comparisons, iHMnBS outperforms a baseline method, and the probability of binding to modified histones assigned to a representative nucleotide of a DNA sequence can serve as a reference for biological experiments. Since the interaction between transcription factors and histone modifications has an important role in gene expression, we extracted a number of sequence patterns that may bind to transcription factors, and explored their possible impact on disease. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/lennylv/iHMnBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Histonas , Histonas/metabolismo , Sequência de Bases , Sítios de Ligação , DNA/química , Fatores de Transcrição/metabolismo

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model.

Zhang, Yikang; Chu, Xiaomin; Jiang, Yelu; Wu, Hongjie; Quan, Lijun.

Genes (Basel) ; 13(4)2022 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-35456374

RESUMO

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug-DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.

Assuntos

Cromatina , Idioma , Cromatina/genética , Imunoprecipitação da Cromatina , DNA/genética , Fatores de Transcrição/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA