Your browser doesn't support javascript.
loading
Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism.
Hu, Jun; Chen, Kai-Xin; Rao, Bing; Ni, Jing-Yuan; Thafar, Maha A; Albaradei, Somayah; Arif, Muhammad.
Afiliação
  • Hu J; College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China. Electronic address: hj@ism.cams.cn.
  • Chen KX; College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
  • Rao B; School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China.
  • Ni JY; NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China.
  • Thafar MA; Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia.
  • Albaradei S; Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
  • Arif M; College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar. Electronic address: mfarif@hbku.edu.qa.
Anal Biochem ; 694: 115637, 2024 Aug 08.
Article em En | MEDLINE | ID: mdl-39121938
ABSTRACT
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https//github.com/ckx259/E2EPep.git for academic use only.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Anal Biochem Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Anal Biochem Ano de publicação: 2024 Tipo de documento: Article