Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model.

Shrestha, Palistha; Kandel, Jeevan; Tayara, Hilal; Chong, Kil To

Shrestha, Palistha; Kandel, Jeevan; Tayara, Hilal; Chong, Kil To.

Affiliation

Shrestha P; Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
Kandel J; Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea.
Tayara H; School of International Engineering and Science, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea. hilaltayara@jbnu.ac.kr.
Chong KT; Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeollabuk-do, Republic of Korea. kitchong@jbnu.ac.kr.

Nat Commun ; 15(1): 6699, 2024 Aug 07.

Article in En | MEDLINE | ID: mdl-39107330

ABSTRACT

ABSTRACT

Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.

Subject(s)

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Protein Processing, Post-Translational Limits: Humans Language: En Journal: Nat Commun Journal subject: BIOLOGIA / CIENCIA Year: 2024 Document type: Article Country of publication:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google