RESUMEN
Reversible phosphorylation is a key mechanism for regulating protein function. Thus it is of high interest to know which kinase can phosphorylate which proteins. Comprehensive information about phosphorylation sites in Arabidopsis proteins is hosted within the PhosPhAt database (http://phosphat.mpimp-golm.mpg.de). However, our knowledge of the kinases that phosphorylate those sites is dispersed throughout the literature and very difficult to access, particularly for investigators seeking to interpret large scale and high-throughput experiments. Therefore, we aimed to compile information on kinase-substrate interactions and kinase-specific regulatory information and make this available via a new functionality embedded in PhosPhAt. Our approach involved systematic surveying of the literature for regulatory information on the members of the major kinase families in Arabidopsis thaliana, such as CDPKs, MPK(KK)s, AGC kinases and SnRKs, as well as individual kinases from other families. To date, we have researched more than 4450 kinase-related publications, which collectively contain information on about 289 kinases. Users can now query the PhosPhAt database not only for experimental and predicted phosphorylation sites of individual proteins, but also for known substrates for a given kinase or kinase family. Further developments include addition of new phosphorylation sites and visualization of clustered phosphorylation events, known as phosphorylation hotspots.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/enzimología , Bases de Datos de Proteínas , Proteínas Quinasas/metabolismo , Proteínas de Arabidopsis/química , Internet , Fosforilación , Transducción de SeñalRESUMEN
BACKGROUND: Protein kinases constitute a particularly large protein family in Arabidopsis with important functions in cellular signal transduction networks. At the same time Arabidopsis is a model plant with high frequencies of gene duplications. Here, we have conducted a systematic analysis of the Arabidopsis kinase complement, the kinome, with particular focus on gene duplication events. We matched Arabidopsis proteins to a Hidden-Markov Model of eukaryotic kinases and computed a phylogeny of 942 Arabidopsis protein kinase domains and mapped their origin by gene duplication. RESULTS: The phylogeny showed two major clades of receptor kinases and soluble kinases, each of which was divided into functional subclades. Based on this phylogeny, association of yet uncharacterized kinases to families was possible which extended functional annotation of unknowns. Classification of gene duplications within these protein kinases revealed that representatives of cytosolic subfamilies showed a tendency to maintain segmentally duplicated genes, while some subfamilies of the receptor kinases were enriched for tandem duplicates. Although functional diversification is observed throughout most subfamilies, some instances of functional conservation among genes transposed from the same ancestor were observed. In general, a significant enrichment of essential genes was found among genes encoding for protein kinases. CONCLUSIONS: The inferred phylogeny allowed classification and annotation of yet uncharacterized kinases. The prediction and analysis of syntenic blocks and duplication events within gene families of interest can be used to link functional biology to insights from an evolutionary viewpoint. The approach undertaken here can be applied to any gene family in any organism with an annotated genome.
Asunto(s)
Arabidopsis/genética , Genoma de Planta , Mapas de Interacción de Proteínas/genética , Proteínas de Arabidopsis/clasificación , Proteínas de Arabidopsis/genética , Evolución Molecular , Duplicación de Gen , Familia de Multigenes , Filogenia , Proteínas Quinasas/clasificación , Proteínas Quinasas/genéticaRESUMEN
The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Linial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at http://phosphat.mpimp-golm.mpg.de.