Improving protein-protein interaction prediction using protein language model and protein network features.

Hu, Jun; Li, Zhe; Rao, Bing; Thafar, Maha A; Arif, Muhammad

Hu, Jun; Li, Zhe; Rao, Bing; Thafar, Maha A; Arif, Muhammad.

Affiliation

Hu J; College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China. Electronic address: junh_cs@126.com.
Li Z; College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
Rao B; Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education, Beijing, 100039, China. Electronic address: raob@hzcu.edu.cn.
Thafar MA; Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia.
Arif M; College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar. Electronic address: mfarif@hbku.edu.qa.

Anal Biochem ; 693: 115550, 2024 Oct.

Article in En | MEDLINE | ID: mdl-38679191

ABSTRACT

ABSTRACT

Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at https//github.com/rickleezhe/KSGPPI.

Subject(s)
Key words

PPI network; Protein language model; Protein-protein interactions prediction

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Proteins Language: En Journal: Anal Biochem Year: 2024 Document type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Proteins Language: En Journal: Anal Biochem Year: 2024 Document type: Article