ToxinPred 3.0: An improved method for predicting the toxicity of peptides.

Rathore, Anand Singh; Choudhury, Shubham; Arora, Akanksha; Tijare, Purva; Raghava, Gajendra P S

Rathore, Anand Singh; Choudhury, Shubham; Arora, Akanksha; Tijare, Purva; Raghava, Gajendra P S.

Afiliação

Rathore AS; Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India. Electronic address: anandr@iiitd.ac.in.
Choudhury S; Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India. Electronic address: shubhamc@iiitd.ac.in.
Arora A; Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India. Electronic address: akankshaar@iiitd.ac.in.
Tijare P; Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India. Electronic address: purvat@iiitd.ac.in.
Raghava GPS; Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India. Electronic address: raghava@iiitd.ac.in.

Comput Biol Med ; 179: 108926, 2024 Sep.

Article em En | MEDLINE | ID: mdl-39038391

ABSTRACT

ABSTRACT

Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https//github.com/raghavagps/toxinpred3 and https//webs.iiitd.edu.in/raghava/toxinpred3/).

Assuntos

Peptídeos; Software; Peptídeos/química; Humanos; Biologia Computacional/métodos; Aprendizado Profundo; Bases de Dados de Proteínas

Palavras-chave

Deep learning; Ensemble/hybrid method; Large language models; Machine learning; Toxic motifs; Virtual screening

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Peptídeos / Software Limite: Humans Idioma: En Revista: Comput Biol Med Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google