Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.

Zhou, Zhongliang; Yeung, Wayland; Gravel, Nathan; Salcedo, Mariah; Soleymani, Saber; Li, Sheng; Kannan, Natarajan

Zhou, Zhongliang; Yeung, Wayland; Gravel, Nathan; Salcedo, Mariah; Soleymani, Saber; Li, Sheng; Kannan, Natarajan.

Afiliación

Zhou Z; School of Computing, University of Georgia, GA 30602, USA.
Yeung W; Institute of Bioinformatics, University of Georgia, GA 30602, USA.
Gravel N; Institute of Bioinformatics, University of Georgia, GA 30602, USA.
Salcedo M; Department of Biochemistry and Molecular Biology, University of Georgia, GA 30602, USA.
Soleymani S; School of Computing, University of Georgia, GA 30602, USA.
Li S; School of Data Science, University of Virginia, VA 22903, USA.
Kannan N; Institute of Bioinformatics, University of Georgia, GA 30602, USA.

Bioinformatics ; 39(2)2023 02 03.

Article en En | MEDLINE | ID: mdl-36692152

ABSTRACT

ABSTRACT

MOTIVATION The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase-substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level.

RESULTS:

We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION Code and data are available at https//github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Asunto(s)

Proteínas Quinasas; Procesamiento Proteico-Postraduccional; Humanos; Fosforilación; Proteínas Quinasas/metabolismo; Proteínas/metabolismo

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteínas Quinasas / Procesamiento Proteico-Postraduccional Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google