Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 111
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Biochem J ; 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38752473

RESUMEN

The Ca2+-independent, but diacylglycerol-regulated, novel protein kinase C (PKC) theta (θ) is highly expressed in hematopoietic cells where it participates in immune signaling and platelet function. Mounting evidence suggests that PKCθ may be involved in cancer, particularly blood cancers, breast cancer, and gastrointestinal stromal tumors (GISTs), yet how to target this kinase (as an oncogene or as a tumor suppressor) has not been established. Here, we examine the effect of four cancer-associated mutations, R145H/C in the autoinhibitory pseudosubstrate, E161K in the regulatory C1A domain, and R635W in the regulatory C-terminal tail, on the cellular activity and stability of PKCθ. Live-cell imaging studies using the genetically-encoded FRET-based reporter for PKC activity, C kinase activity reporter 2 (CKAR2), revealed that the pseudosubstrate and C1A domain mutations impaired autoinhibition to increase basal signaling. This impaired autoinhibition resulted in decreased stability of the protein, consistent with the well-characterized behavior of Ca2+-regulated PKC isozymes wherein mutations that impair autoinhibition are paradoxically loss-of-function because the mutant protein is degraded. In marked contrast, the C-terminal tail mutation resulted in enhanced autoinhibition and enhanced stability. Thus, the examined mutations were loss-of-function by different mechanisms: mutations that impaired autoinhibition promoted the degradation of PKC, and those that enhanced autoinhibition stabilized an inactive PKC. Supporting a general loss-of-function of PKCθ in cancer, bioinformatics analysis revealed that protein levels of PKCθ are reduced in diverse cancers, including lung, renal, head and neck, and pancreatic. Our results reveal that PKCθ function is lost in cancer.

2.
bioRxiv ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38586025

RESUMEN

In eukaryotes, protein kinase signaling is regulated by a diverse array of post-translational modifications (PTMs), including phosphorylation of Ser/Thr residues and oxidation of cysteine (Cys) residues. While regulation by activation segment phosphorylation of Ser/Thr residues is well understood, relatively little is known about how oxidation of cysteine residues modulate catalysis. In this study, we investigate redox regulation of the AMPK-related Brain-selective kinases (BRSK) 1 and 2, and detail how broad catalytic activity is directly regulated through reversible oxidation and reduction of evolutionarily conserved Cys residues within the catalytic domain. We show that redox-dependent control of BRSKs is a dynamic and multilayered process involving oxidative modifications of several Cys residues, including the formation of intramolecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys within an unusual CPE motif at the end of the activation segment. Consistently, mutation of the CPE-Cys increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family.

3.
Res Sq ; 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38410452

RESUMEN

Fructosamine-3-kinases (FN3Ks) are a conserved family of repair enzymes that phosphorylate reactive sugars attached to lysine residues in peptides and proteins. Although FN3Ks are present across the tree of life and share detectable sequence similarity to eukaryotic protein kinases, the biological processes regulated by these kinases are largely unknown. To address this knowledge gap, we leveraged the FN3K CRISPR Knock-Out (KO) cell line alongside an integrative multi-omics study combining transcriptomics, metabolomics, and interactomics to place these enzymes in a pathway context. The integrative analyses revealed the enrichment of pathways related to oxidative stress response, lipid biosynthesis (cholesterol and fatty acids), carbon and co-factor metabolism. Moreover, enrichment of nicotinamide adenine dinucleotide (NAD) binding proteins and localization of human FN3K (HsFN3K) to mitochondria suggests potential links between FN3Ks and NAD-mediated energy metabolism and redox balance. We report specific binding of HsFN3K to NAD compounds in a metal and concentration-dependent manner and provide insight into their binding mode using modeling and experimental site-directed mutagenesis. By identifying a potential link between FN3Ks, redox regulation, and NAD-dependent metabolic processes, our studies provide a framework for targeting these understudied kinases in diabetic complications and metabolic disorders where redox balance is altered.

4.
Sci Adv ; 10(8): eadl1258, 2024 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-38381834

RESUMEN

Adrenal Cushing's syndrome is a disease of cortisol hypersecretion often caused by mutations in protein kinase A catalytic subunit (PKAc). Using a personalized medicine screening platform, we discovered a Cushing's driver mutation, PKAc-W196G, in ~20% of patient samples analyzed. Proximity proteomics and photokinetic imaging reveal that PKAcW196G is unexpectedly distinct from other described Cushing's variants, exhibiting retained association with type I regulatory subunits (RI) and their corresponding A kinase anchoring proteins (AKAPs). Molecular dynamics simulations predict that substitution of tryptophan-196 with glycine creates a 653-cubic angstrom cleft between the catalytic core of PKAcW196G and type II regulatory subunits (RII), but only a 395-cubic angstrom cleft with RI. Endocrine measurements show that overexpression of RIα or redistribution of PKAcW196G via AKAP recruitment counteracts stress hormone overproduction. We conclude that a W196G mutation in the kinase catalytic core skews R subunit selectivity and biases AKAP association to drive Cushing's syndrome.


Asunto(s)
Síndrome de Cushing , Humanos , Síndrome de Cushing/genética , Proteínas de Anclaje a la Quinasa A/genética , Proteínas de Anclaje a la Quinasa A/metabolismo , Transducción de Señal , Dominio Catalítico , Sesgo
5.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38244571

RESUMEN

MOTIVATION: Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases. RESULTS: Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.


Asunto(s)
Biblioteca de Péptidos , Proteínas Quinasas , Humanos , Proteínas Quinasas/metabolismo , Fosforilación , Péptidos/química , Aprendizaje Automático
6.
Drug Discov Today ; 29(3): 103894, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38266979

RESUMEN

The understudied members of the druggable proteomes offer promising prospects for drug discovery efforts. While large-scale initiatives have generated valuable functional information on understudied members of the druggable gene families, translating this information into actionable knowledge for drug discovery requires specialized informatics tools and resources. Here, we review the unique informatics challenges and advances in annotating understudied members of the druggable proteome. We demonstrate the application of statistical evolutionary inference tools, knowledge graph mining approaches, and protein language models in illuminating understudied protein kinases, pseudokinases, and ion channels.


Asunto(s)
Informática , Proteoma
7.
PeerJ ; 11: e16087, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38077442

RESUMEN

The Protein Kinase Ontology (ProKinO) is an integrated knowledge graph that conceptualizes the complex relationships among protein kinase sequence, structure, function, and disease in a human and machine-readable format. In this study, we have significantly expanded ProKinO by incorporating additional data on expression patterns and drug interactions. Furthermore, we have developed a completely new browser from the ground up to render the knowledge graph visible and interactive on the web. We have enriched ProKinO with new classes and relationships that capture information on kinase ligand binding sites, expression patterns, and functional features. These additions extend ProKinO's capabilities as a discovery tool, enabling it to uncover novel insights about understudied members of the protein kinase family. We next demonstrate the application of ProKinO. Specifically, through graph mining and aggregate SPARQL queries, we identify the p21-activated protein kinase 5 (PAK5) as one of the most frequently mutated dark kinases in human cancers with abnormal expression in multiple cancers, including a previously unappreciated role in acute myeloid leukemia. We have identified recurrent oncogenic mutations in the PAK5 activation loop predicted to alter substrate binding and phosphorylation. Additionally, we have identified common ligand/drug binding residues in PAK family kinases, underscoring ProKinO's potential application in drug discovery. The updated ontology browser and the addition of a web component, ProtVista, which enables interactive mining of kinase sequence annotations in 3D structures and Alphafold models, provide a valuable resource for the signaling community. The updated ProKinO database is accessible at https://prokino.uga.edu.


Asunto(s)
Neoplasias , Proteínas Quinasas , Humanos , Proteínas Quinasas/genética , Ligandos , Proteínas/genética , Fosforilación
8.
Nat Commun ; 14(1): 6548, 2023 10 17.
Artículo en Inglés | MEDLINE | ID: mdl-37848415

RESUMEN

Autophosphorylation controls the transition between discrete functional and conformational states in protein kinases, yet the structural and molecular determinants underlying this fundamental process remain unclear. Here we show that c-terminal Tyr 530 is a de facto c-Src autophosphorylation site with slow time-resolution kinetics and a strong intermolecular component. On the contrary, activation-loop Tyr 419 undergoes faster kinetics and a cis-to-trans phosphorylation switch that controls c-terminal Tyr 530 autophosphorylation, enzyme specificity, and strikingly, c-Src non-catalytic function as a substrate. In line with this, we visualize by X-ray crystallography a snapshot of Tyr 530 intermolecular autophosphorylation. In an asymmetric arrangement of both catalytic domains, a c-terminal palindromic phospho-motif flanking Tyr 530 on the substrate molecule engages the G-loop of the active kinase adopting a position ready for entry into the catalytic cleft. Perturbation of the phospho-motif accounts for c-Src dysfunction as indicated by viral and colorectal cancer (CRC)-associated c-terminal deleted variants. We show that c-terminal residues 531 to 536 are required for c-Src Tyr 530 autophosphorylation, and such a detrimental effect is caused by the substrate molecule inhibiting allosterically the active kinase. Our work reveals a crosstalk between the activation and c-terminal segments that control the allosteric interplay between substrate- and enzyme-acting kinases during autophosphorylation.


Asunto(s)
Familia-src Quinasas , Fosforilación , Proteína Tirosina Quinasa CSK/metabolismo , Dominio Catalítico , Familia-src Quinasas/metabolismo
9.
Nat Commun ; 14(1): 6804, 2023 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-37884510

RESUMEN

The necroptosis pathway is a lytic, pro-inflammatory mode of cell death that is widely implicated in human disease, including renal, pulmonary, gut and skin inflammatory pathologies. The precise mechanism of the terminal steps in the pathway, where the RIPK3 kinase phosphorylates and triggers a conformation change and oligomerization of the terminal pathway effector, MLKL, are only emerging. Here, we structurally identify RIPK3-mediated phosphorylation of the human MLKL activation loop as a cue for MLKL pseudokinase domain dimerization. MLKL pseudokinase domain dimerization subsequently drives formation of elongated homotetramers. Negative stain electron microscopy and modelling support nucleation of the MLKL tetramer assembly by a central coiled coil formed by the extended, ~80 Å brace helix that connects the pseudokinase and executioner four-helix bundle domains. Mutational data assert MLKL tetramerization as an essential prerequisite step to enable the release and reorganization of four-helix bundle domains for membrane permeabilization and cell death.


Asunto(s)
Proteínas Quinasas , Proteína Serina-Treonina Quinasas de Interacción con Receptores , Humanos , Fosforilación , Necrosis , Proteínas Quinasas/metabolismo , Dimerización , Muerte Celular , Proteína Serina-Treonina Quinasas de Interacción con Receptores/metabolismo , Apoptosis
10.
PeerJ ; 11: e15815, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37868056

RESUMEN

The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied "dark" members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Proteínas , Humanos , Proteínas/genética , Biología Computacional , Aprendizaje , Conocimiento
11.
Elife ; 122023 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-37883155

RESUMEN

Catalytic signaling outputs of protein kinases are dynamically regulated by an array of structural mechanisms, including allosteric interactions mediated by intrinsically disordered segments flanking the conserved catalytic domain. The doublecortin-like kinases (DCLKs) are a family of microtubule-associated proteins characterized by a flexible C-terminal autoregulatory 'tail' segment that varies in length across the various human DCLK isoforms. However, the mechanism whereby these isoform-specific variations contribute to unique modes of autoregulation is not well understood. Here, we employ a combination of statistical sequence analysis, molecular dynamics simulations, and in vitro mutational analysis to define hallmarks of DCLK family evolutionary divergence, including analysis of splice variants within the DCLK1 sub-family, which arise through alternative codon usage and serve to 'supercharge' the inhibitory potential of the DCLK1 C-tail. We identify co-conserved motifs that readily distinguish DCLKs from all other calcium calmodulin kinases (CAMKs), and a 'Swiss Army' assembly of distinct motifs that tether the C-terminal tail to conserved ATP and substrate-binding regions of the catalytic domain to generate a scaffold for autoregulation through C-tail dynamics. Consistently, deletions and mutations that alter C-terminal tail length or interfere with co-conserved interactions within the catalytic domain alter intrinsic protein stability, nucleotide/inhibitor binding, and catalytic activity, suggesting isoform-specific regulation of activity through alternative splicing. Our studies provide a detailed framework for investigating kinome-wide regulation of catalytic output through cis-regulatory events mediated by intrinsically disordered segments, opening new avenues for the design of mechanistically divergent DCLK1 modulators, stabilizers, or degraders.


Asunto(s)
Evolución Biológica , Proteínas Serina-Treonina Quinasas , Humanos , Isoformas de Proteínas/genética , Proteínas Serina-Treonina Quinasas/genética , Empalme Alternativo , Calcio de la Dieta , Quinasas Similares a Doblecortina
12.
bioRxiv ; 2023 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-37034755

RESUMEN

Catalytic signaling outputs of protein kinases are dynamically regulated by an array of structural mechanisms, including allosteric interactions mediated by intrinsically disordered segments flanking the conserved catalytic domain. The Doublecortin Like Kinases (DCLKs) are a family of microtubule-associated proteins characterized by a flexible C-terminal autoregulatory 'tail' segment that varies in length across the various human DCLK isoforms. However, the mechanism whereby these isoform-specific variations contribute to unique modes of autoregulation is not well understood. Here, we employ a combination of statistical sequence analysis, molecular dynamics simulations and in vitro mutational analysis to define hallmarks of DCLK family evolutionary divergence, including analysis of splice variants within the DCLK1 sub-family, which arise through alternative codon usage and serve to 'supercharge' the inhibitory potential of the DCLK1 C-tail. We identify co-conserved motifs that readily distinguish DCLKs from all other Calcium Calmodulin Kinases (CAMKs), and a 'Swiss-army' assembly of distinct motifs that tether the C-terminal tail to conserved ATP and substrate-binding regions of the catalytic domain to generate a scaffold for auto-regulation through C-tail dynamics. Consistently, deletions and mutations that alter C-terminal tail length or interfere with co-conserved interactions within the catalytic domain alter intrinsic protein stability, nucleotide/inhibitor-binding, and catalytic activity, suggesting isoform-specific regulation of activity through alternative splicing. Our studies provide a detailed framework for investigating kinome-wide regulation of catalytic output through cis-regulatory events mediated by intrinsically disordered segments, opening new avenues for the design of mechanistically-divergent DCLK1 modulators, stabilizers or degraders.

13.
G3 (Bethesda) ; 13(7)2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37119806

RESUMEN

The current understanding of farnesyltransferase (FTase) specificity was pioneered through investigations of reporters like Ras and Ras-related proteins that possess a C-terminal CaaX motif that consists of 4 amino acid residues: cysteine-aliphatic1-aliphatic2-variable (X). These studies led to the finding that proteins with the CaaX motif are subject to a 3-step post-translational modification pathway involving farnesylation, proteolysis, and carboxylmethylation. Emerging evidence indicates, however, that FTase can farnesylate sequences outside the CaaX motif and that these sequences do not undergo the canonical 3-step pathway. In this work, we report a comprehensive evaluation of all possible CXXX sequences as FTase targets using the reporter Ydj1, an Hsp40 chaperone that only requires farnesylation for its activity. Our genetic and high-throughput sequencing approach reveals an unprecedented profile of sequences that yeast FTase can recognize in vivo, which effectively expands the potential target space of FTase within the yeast proteome. We also document that yeast FTase specificity is majorly influenced by restrictive amino acids at a2 and X positions as opposed to the resemblance of CaaX motif as previously regarded. This first complete evaluation of CXXX space expands the complexity of protein isoprenylation and marks a key step forward in understanding the potential scope of targets for this isoprenylation pathway.


Asunto(s)
Transferasas Alquil y Aril , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Farnesiltransferasa/genética , Farnesiltransferasa/metabolismo , Secuencia de Aminoácidos , Transferasas Alquil y Aril/genética , Transferasas Alquil y Aril/metabolismo , Prenilación de Proteína , Proteínas/genética , Especificidad por Sustrato
14.
Microbiol Spectr ; 11(3): e0325222, 2023 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-36995217

RESUMEN

Pneumococcal pneumonia remains a WHO high-priority disease despite multivalent conjugate vaccines administered in clinical practice worldwide. A protein-based, serotype-independent vaccine has long-promised comprehensive coverage of most clinical isolates of the pneumococcus. Along with numerous pneumococcal surface protein immunogens, the pneumococcal serine-rich repeat protein (PsrP) has been investigated as a potential vaccine target due to its surface exposure and functions toward bacterial virulence and lung infection. Three critical criteria for its vaccine potential - the clinical prevalence, serotype distribution, and sequence homology of PsrP - have yet to be well characterized. Here, we used genomes of 13,454 clinically isolated pneumococci from the Global Pneumococcal Sequencing project to investigate PsrP presence among isolates, distribution among serotypes, and interrogate its homology as a protein across species. These isolates represent all age groups, countries worldwide, and types of pneumococcal infection. We found PsrP present in at least 50% of all isolates across all determined serotypes and nontypeable (NT) clinical isolates. Using a combination of peptide matching and HMM profiles built on full-length and individual PsrP domains, we identified novel variants that expand PsrP diversity and prevalence. We also observed sequence variability in its basic region (BR) between isolates and serotypes. PsrP has a strong vaccine potential due to its breadth of coverage, especially in nonvaccine serotypes (NVTs) when exploiting its regions of conservation in vaccine design. IMPORTANCE An updated outlook on PsrP prevalence and serotype distribution sheds new light on the comprehensiveness of a PsrP-based protein vaccine. The protein is present in all vaccine serotypes and highly present in the next wave of potentially disease-causing serotypes not included in the current multivalent conjugate vaccines. Furthermore, PsrP is strongly correlated with clinical isolates harboring pneumococcal disease as opposed to pneumococcal carriage. PsrP is also highly present in strains and serotypes from Africa, where the need for a protein-based vaccine is the greatest, giving new reasoning to pursue PsrP as a protein vaccine.


Asunto(s)
Infecciones Neumocócicas , Streptococcus pneumoniae , Humanos , Vacunas Conjugadas , Prevalencia , Infecciones Neumocócicas/epidemiología , Infecciones Neumocócicas/prevención & control , Infecciones Neumocócicas/microbiología , Vacunas Neumococicas
16.
Nat Plants ; 9(3): 486-500, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36849618

RESUMEN

Rhamnogalacturonan I (RGI) is a structurally complex pectic polysaccharide with a backbone of alternating rhamnose and galacturonic acid residues substituted with arabinan and galactan side chains. Galactan synthase 1 (GalS1) transfers galactose and arabinose to either extend or cap the ß-1,4-galactan side chains of RGI, respectively. Here we report the structure of GalS1 from Populus trichocarpa, showing a modular protein consisting of an N-terminal domain that represents the founding member of a new family of carbohydrate-binding module, CBM95, and a C-terminal glycosyltransferase family 92 (GT92) catalytic domain that adopts a GT-A fold. GalS1 exists as a dimer in vitro, with stem domains interacting across the chains in a 'handshake' orientation that is essential for maintaining stability and activity. In addition to understanding the enzymatic mechanism of GalS1, we gained insight into the donor and acceptor substrate binding sites using deep evolutionary analysis, molecular simulations and biochemical studies. Combining all the results, a mechanism for GalS1 catalysis and a new model for pectic galactan side-chain addition are proposed.


Asunto(s)
Galactanos , Glicosiltransferasas , Galactanos/metabolismo , Glicosiltransferasas/metabolismo
17.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36692152

RESUMEN

MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase-substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas Quinasas , Procesamiento Proteico-Postraduccional , Humanos , Fosforilación , Proteínas Quinasas/metabolismo , Proteínas/metabolismo
18.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36642409

RESUMEN

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.


Asunto(s)
Secuencia de Aminoácidos , Proteínas , Análisis por Conglomerados , Proteínas/química , Alineación de Secuencia
19.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631405

RESUMEN

Protein language modeling is a fast-emerging deep learning method in bioinformatics with diverse applications such as structure prediction and protein design. However, application toward estimating sequence conservation for functional site prediction has not been systematically explored. Here, we present a method for the alignment-free estimation of sequence conservation using sequence embeddings generated from protein language models. Comprehensive benchmarks across publicly available protein language models reveal that ESM2 models provide the best performance to computational cost ratio for conservation estimation. Applying our method to full-length protein sequences, we demonstrate that embedding-based methods are not sensitive to the order of conserved elements-conservation scores can be calculated for multidomain proteins in a single run, without the need to separate individual domains. Our method can also identify conserved functional sites within fast-evolving sequence regions (such as domain inserts), which we demonstrate through the identification of conserved phosphorylation motifs in variable insert segments in protein kinases. Overall, embedding-based conservation analysis is a broadly applicable method for identifying potential functional sites in any full-length protein sequence and estimating conservation in an alignment-free manner. To run this on your protein sequence of interest, try our scripts at https://github.com/esbgkannan/kibby.


Asunto(s)
Biología Computacional , Proteínas , Secuencia de Aminoácidos , Proteínas/genética , Proteínas/química , Biología Computacional/métodos , Secuencia Conservada
20.
Biochem J ; 480(2): 141-160, 2023 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-36520605

RESUMEN

Pseudokinases, so named because they lack one or more conserved canonical amino acids that define their catalytically active relatives, have evolved a variety of biological functions in both prokaryotic and eukaryotic organisms. Human PSKH2 is closely related to the canonical kinase PSKH1, which maps to the CAMK family of protein kinases. Primates encode PSKH2 in the form of a pseudokinase, which is predicted to be catalytically inactive due to loss of the invariant catalytic Asp residue. Although the biological role(s) of vertebrate PSKH2 proteins remains unclear, we previously identified species-level adaptions in PSKH2 that have led to the appearance of kinase or pseudokinase variants in vertebrate genomes alongside a canonical PSKH1 paralog. In this paper we confirm that, as predicted, PSKH2 lacks detectable protein phosphotransferase activity, and exploit structural informatics, biochemistry and cellular proteomics to begin to characterise vertebrate PSKH2 orthologues. AlphaFold 2-based structural analysis predicts functional roles for both the PSKH2 N- and C-regions that flank the pseudokinase domain core, and cellular truncation analysis confirms that the N-terminal domain, which contains a conserved myristoylation site, is required for both stable human PSKH2 expression and localisation to a membrane-rich subcellular fraction containing mitochondrial proteins. Using mass spectrometry-based proteomics, we confirm that human PSKH2 is part of a cellular mitochondrial protein network, and that its expression is regulated through client-status within the HSP90/Cdc37 molecular chaperone system. HSP90 interactions are mediated through binding to the PSKH2 C-terminal tail, leading us to predict that this region might act as both a cis and trans regulatory element, driving outputs linked to the PSKH2 pseudokinase domain that are important for functional signalling.


Asunto(s)
Proteínas Quinasas , Transducción de Señal , Animales , Humanos , Proteínas Quinasas/metabolismo , Fosforilación , Chaperonas Moleculares/metabolismo , Evolución Biológica , Proteínas HSP90 de Choque Térmico/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA