Your browser doesn't support javascript.
loading
NeuroPpred-SHE: An interpretable neuropeptides prediction model based on selected features from hand-crafted features and embeddings of T5 model.
Wen, Jian; Ding, Zhijie; Wei, Zhuoyu; Xia, Hongwei; Zhang, Yong; Zhu, Xiaolei.
Affiliation
  • Wen J; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
  • Ding Z; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
  • Wei Z; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
  • Xia H; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
  • Zhang Y; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China. Electronic address: yongzhang@ahau.edu.cn.
  • Zhu X; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China. Electronic address: xlzhu@ahau.edu.cn.
Comput Biol Med ; 181: 109048, 2024 Aug 24.
Article in En | MEDLINE | ID: mdl-39182368
ABSTRACT
Neuropeptides are the most ubiquitous neurotransmitters in the immune system, regulating various biological processes. Neuropeptides play a significant role for the discovery of new drugs and targets for nervous system disorders. Traditional experimental methods for identifying neuropeptides are time-consuming and costly. Although several computational methods have been developed to predict the neuropeptides, the accuracy is still not satisfactory due to the representability of the extracted features. In this work, we propose an efficient and interpretable model, NeuroPpred-SHE, for predicting neuropeptides by selecting the optimal feature subset from both hand-crafted features and embeddings of a protein language model. Specially, we first employed a pre-trained T5 protein language model to extract embedding features and twelve other encoding methods to extract hand-crafted features from peptide sequences, respectively. Secondly, we fused both embedding features and hand-crafted features to enhance the feature representability. Thirdly, we utilized random forest (RF), Max-Relevance and Min-Redundancy (mRMR) and eXtreme Gradient Boosting (XGBoost) methods to select the optimal feature subset from the fused features. Finally, we employed five machine learning methods (GBDT, XGBoost, SVM, MLP, and LightGBM) to build the models. Our results show that the model based on GBDT achieves the best performance. Furthermore, our final model was compared with other state-of-the-art methods on an independent test set, the results indicate that our model achieves an AUROC of 97.8 % which is higher than all the other state-of-the-art predictors. Our model is available at https//github.com/wenjean/NeuroPpred-SHE.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Biol Med Year: 2024 Document type: Article Affiliation country: China

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Biol Med Year: 2024 Document type: Article Affiliation country: China