Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Bioinformatics ; 32(2): 165-72, 2016 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-26411868

RESUMO

UNLABELLED: S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs. AVAILABILITY AND IMPLEMENTATION: The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/. All of the data set used in this work is also available for download in the website. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: francis@saturn.yzu.edu.tw.


Assuntos
Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína/métodos , Software , Ácidos de Enxofre/metabolismo , Máquina de Vetores de Suporte , Motivos de Aminoácidos , Aminoácidos/química , Cisteína/metabolismo , Humanos , Proteínas/química , Proteínas/metabolismo
2.
Nucleic Acids Res ; 43(Database issue): D503-11, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25399423

RESUMO

Given the increasing number of proteins reported to be regulated by S-nitrosylation (SNO), it is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders. Due to its importance in regulating protein functions and cell signaling, dbSNO (http://dbSNO.mbc.nctu.edu.tw) is extended as a resource for exploring structural environment of SNO substrate sites and regulatory networks of S-nitrosylated proteins. An increasing interest in the structural environment of PTM substrate sites motivated us to map all manually curated SNO peptides (4165 SNO sites within 2277 proteins) to PDB protein entries by sequence identity, which provides the information of spatial amino acid composition, solvent-accessible surface area, spatially neighboring amino acids, and side chain orientation for 298 substrate cysteine residues. Additionally, the annotations of protein molecular functions, biological processes, functional domains and human diseases are integrated to explore the functional and disease associations for S-nitrosoproteome. In this update, users are allowed to search a group of interested proteins/genes and the system reconstructs the SNO regulatory network based on the information of metabolic pathways and protein-protein interactions. Most importantly, an endogenous yet pathophysiological S-nitrosoproteomic dataset from colorectal cancer patients was adopted to demonstrate that dbSNO could discover potential SNO proteins involving in the regulation of NO signaling for cancer pathways.


Assuntos
Bases de Dados de Proteínas , Óxido Nítrico/metabolismo , Processamento de Proteína Pós-Traducional , Aminoácidos/química , Animais , Doença , Humanos , Internet , Redes e Vias Metabólicas , Camundongos , Mapeamento de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Ratos , Transdução de Sinais
3.
BMC Genomics ; 17 Suppl 1: 9, 2016 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-26819243

RESUMO

BACKGROUND: Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. RESULTS: In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. CONCLUSION: This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Motivos de Aminoácidos , Matrizes de Pontuação de Posição Específica , Processamento de Proteína Pós-Traducional , Máquina de Vetores de Suporte
4.
Nucleic Acids Res ; 42(Database issue): D537-45, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24302577

RESUMO

Transmembrane (TM) proteins have crucial roles in various cellular processes. The location of post-translational modifications (PTMs) on TM proteins is associated with their functional roles in various cellular processes. Given the importance of PTMs in the functioning of TM proteins, this study developed topPTM (available online at http://topPTM.cse.yzu.edu.tw), a new dbPTM module that provides a public resource for identifying the functional PTM sites on TM proteins with structural topology. Experimentally verified TM topology data were integrated from TMPad, TOPDB, PDBTM and OPM. In addition to the PTMs obtained from dbPTM, experimentally verified PTM sites were manually extracted from research articles by text mining. In an attempt to provide a full investigation of PTM sites on TM proteins, all UniProtKB protein entries containing annotations related to membrane localization and TM topology were considered potential TM proteins. Two effective tools were then used to annotate the structural topology of the potential TM proteins. The TM topology of TM proteins is represented by graphical visualization, as well as by the PTM sites. To delineate the structural correlation between the PTM sites and TM topologies, the tertiary structure of PTM sites on TM proteins was visualized by Jmol program. Given the support of research articles by manual curation and the investigation of domain-domain interactions in Protein Data Bank, 1347 PTM substrate sites are associated with protein-protein interactions for 773 TM proteins. The database content is regularly updated on publication of new data by continuous surveys of research articles and available resources.


Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana/metabolismo , Processamento de Proteína Pós-Traducional , Internet , Proteínas de Membrana/química , Estrutura Terciária de Proteína
5.
BMC Bioinformatics ; 16 Suppl 18: S10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26680539

RESUMO

Protein O-GlcNAcylation, involving the ß-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.


Assuntos
Aprendizado de Máquina , N-Acetilglucosaminiltransferases/metabolismo , Proteínas/química , Acetilglucosamina/metabolismo , Algoritmos , Motivos de Aminoácidos , Glicosilação , Internet , Espectrometria de Massas , Peptídeos/análise , Peptídeos/metabolismo , Proteínas/metabolismo , Especificidade por Substrato , Máquina de Vetores de Suporte , Interface Usuário-Computador
6.
Bioinformatics ; 30(16): 2386-8, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24790154

RESUMO

UNLABELLED: S-glutathionylation, the reversible protein posttranslational modification (PTM) that generates a mixed disulfide bond between glutathione and cysteine residue, critically regulates protein activity, stability and redox regulation. Due to its importance in regulating oxidative/nitrosative stress and balance in cellular response, a number of methods have been rapidly developed to study S-glutathionylation, thus expanding the dataset of experimentally determined glutathionylation sites. However, there is currently no database dedicated to the integration of all experimentally verified S-glutathionylation sites along with their characteristics or structural or functional information. Thus, the dbGSH database has been created to integrate all available datasets and to provide the relevant structural analysis. As of January 31, 2014, dbGSH has manually collected >2200 experimentally verified S-glutathionylated peptides from 169 research articles using a text-mining approach. To solve the problem of heterogeneity of the data collected from different sources, the sequence identity of the reported S-glutathionylated peptides is mapped to UniProtKB protein entries. To delineate the structural correlations and consensus motifs of these S-glutathionylation sites, the dbGSH database also provides structural and functional analyses, including the motifs of substrate sites, solvent accessibility, protein secondary and tertiary structures, protein domains and gene ontology. AVAILABILITY AND IMPLEMENTATION: dbGSH is now freely accessible at http://csb.cse.yzu.edu.tw/dbGSH/. The database content is regularly updated with new data collected by the continuous survey of research articles.


Assuntos
Cisteína/metabolismo , Bases de Dados de Proteínas , Glutationa/metabolismo , Processamento de Proteína Pós-Traducional , Motivos de Aminoácidos , Peptídeos/química , Peptídeos/metabolismo , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/metabolismo
7.
Nucleic Acids Res ; 41(Database issue): D295-305, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23193290

RESUMO

Protein modification is an extremely important post-translational regulation that adjusts the physical and chemical properties, conformation, stability and activity of a protein; thus altering protein function. Due to the high throughput of mass spectrometry (MS)-based methods in identifying site-specific post-translational modifications (PTMs), dbPTM (http://dbPTM.mbc.nctu.edu.tw/) is updated to integrate experimental PTMs obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles. Version 3.0 of dbPTM aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed statistical method has been applied to identify the significant substrate motifs for each type of PTMs containing sufficient experimental data. According to the data statistics in dbPTM, >60% of PTM sites are located in the functional domains of proteins. It is known that most PTMs can create binding sites for specific protein-interaction domains that work together for cellular function. Thus, this update integrates protein-protein interaction and domain-domain interaction to determine the functional association of PTM sites located in protein-interacting domains. Additionally, the information of structural topologies on transmembrane (TM) proteins is integrated in dbPTM in order to delineate the structural correlation between the reported PTM sites and TM topologies. To facilitate the investigation of PTMs on TM proteins, the PTM substrate sites and the structural topology are graphically represented. Also, literature information related to PTMs, orthologous conservations and substrate motifs of PTMs are also provided in the resource. Finally, this version features an improved web interface to facilitate convenient access to the resource.


Assuntos
Bases de Dados de Proteínas , Modificação Traducional de Proteínas , Internet , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Estrutura Terciária de Proteína , Especificidade por Substrato , Interface Usuário-Computador
8.
BMC Bioinformatics ; 15 Suppl 16: S1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25521204

RESUMO

BACKGROUND: Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. RESULTS: In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. CONCLUSION: This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.


Assuntos
Acetilglucosamina/química , Acetilglucosamina/metabolismo , Biologia Computacional/métodos , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteínas/metabolismo , Motivos de Aminoácidos , Glicosilação , Humanos , Espectrometria de Massas , Modelos Moleculares , Fosforilação , Proteômica , Transdução de Sinais , Especificidade por Substrato , Máquina de Vetores de Suporte
9.
J Proteome Res ; 13(11): 4942-58, 2014 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-25040305

RESUMO

The abnormal S-nitrosylation induced by the overexpression and activation of inducible nitric oxide synthase (iNOS) modulates many human diseases, such as inflammation and cancer. To delineate the pathophysiological S-nitrosoproteome in cancer patients, we report an individualized S-nitrosoproteomic strategy with a label-free method for the site-specific quantification of S-nitrosylation in paired tumor and adjacent normal tissues from 11 patients with colorectal cancer (CRC). This study provides not only the first endogenous human S-nitrosoproteomic atlas but also the first individualized human tissue analysis, identifying 174 S-nitrosylation sites in 94 proteins. Fourteen novel S-nitrosylation sites with a high frequency of elevated levels in 11 individual patients were identified. An individualized S-nitrosylation quantitation analysis revealed that the detected changes in S-nitrosylation were regulated by both the expression level and the more dramatic post-translational S-nitrosylation of the targeted proteins, such as thioredoxin, annexin A4, and peroxiredoxin-4. These endogenous S-nitrosylated proteins illustrate the network of inflammation/cancer-related and redox reactions mediated by various S-nitrosylation sources, including iNOS, transnitrosylase, or iron-sulfur centers. Given the demonstrated sensitivity of individualized tissue analysis, this label-free approach may facilitate the study of the vastly under-represented S-nitrosoproteome and enable a better understanding of the effect of endogenous S-nitrosylation in cancer.


Assuntos
Neoplasias Colorretais/metabolismo , Proteínas/análise , Proteínas/metabolismo , Proteômica/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Western Blotting , Neoplasias Colorretais/cirurgia , Cisteína/metabolismo , Humanos , Dados de Sequência Molecular , Óxido Nítrico Sintase Tipo II/metabolismo , Medicina de Precisão , Proteínas/química , Valores de Referência , Reprodutibilidade dos Testes , Soroalbumina Bovina/análise , Espectrometria de Massas em Tandem/métodos , Tiorredoxinas/metabolismo , Regulação para Cima
10.
BMC Bioinformatics ; 14 Suppl 16: S10, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564381

RESUMO

BACKGROUND: The phosphorylation of virus proteins by host kinases is linked to viral replication. This leads to an inhibition of normal host-cell functions. Further elucidation of phosphorylation in virus proteins is required in order to aid in drug design and treatment. However, only a few studies have investigated substrate motifs in identifying virus phosphorylation sites. Additionally, existing bioinformatics tool do not consider potential host kinases that may initiate the phosphorylation of a virus protein. RESULTS: 329 experimentally verified phosphorylation fragments on 111 virus proteins were collected from virPTM. These were clustered into subgroups of significantly conserved motifs using a recursively statistical method. Two-layered Support Vector Machines (SVMs) were then applied to train a predictive model for the identified substrate motifs. The SVM models were evaluated using a five-fold cross validation which yields an average accuracy of 0.86 for serine, and 0.81 for threonine. Furthermore, the proposed method is shown to perform at par with three other phosphorylation site prediction tools: PPSP, KinasePhos 2.0 and GPS 2.1. CONCLUSION: In this study, we propose a computational method, ViralPhos, which aims to investigate virus substrate site motifs and identify potential phosphorylation sites on virus proteins. We identified informative substrate motifs that matched with several well-studied kinase groups as potential catalytic kinases for virus protein substrates. The identified substrate motifs were further exploited to identify potential virus phosphorylation sites. The proposed method is shown to be capable of predicting virus phosphorylation sites and has been implemented as a web server http://csb.cse.yzu.edu.tw/ViralPhos/.


Assuntos
Biologia Computacional/métodos , Fosforilação , Proteínas Virais/química , Bases de Dados de Proteínas , Internet , Modelos Estatísticos , Fosfotransferases/química , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
11.
Bioinformatics ; 28(17): 2293-5, 2012 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-22782549

RESUMO

UNLABELLED: S-nitrosylation (SNO), a selective and reversible protein post-translational modification that involves the covalent attachment of nitric oxide (NO) to the sulfur atom of cysteine, critically regulates protein activity, localization and stability. Due to its importance in regulating protein functions and cell signaling, a mass spectrometry-based proteomics method rapidly evolved to increase the dataset of experimentally determined SNO sites. However, there is currently no database dedicated to the integration of all experimentally verified S-nitrosylation sites with their structural or functional information. Thus, the dbSNO database is created to integrate all available datasets and to provide their structural analysis. Up to April 15, 2012, the dbSNO has manually accumulated >3000 experimentally verified S-nitrosylated peptides from 219 research articles using a text mining approach. To solve the heterogeneity among the data collected from different sources, the sequence identity of these reported S-nitrosylated peptides are mapped to the UniProtKB protein entries. To delineate the structural correlation and consensus motif of these SNO sites, the dbSNO database also provides structural and functional analyses, including the motifs of substrate sites, solvent accessibility, protein secondary and tertiary structures, protein domains and gene ontology. AVAILABILITY: The dbSNO is now freely accessible via http://dbSNO.mbc.nctu.edu.tw. The database content is regularly updated upon collecting new data obtained from continuously surveying research articles.


Assuntos
Cisteína/metabolismo , Proteínas/metabolismo , Proteômica/métodos , Cisteína/química , Espectrometria de Massas , Óxido Nítrico/metabolismo , Peptídeos/metabolismo , Processamento de Proteína Pós-Traducional , Estrutura Terciária de Proteína , Proteínas/química
12.
Bioinformatics ; 27(13): 1780-7, 2011 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-21551145

RESUMO

UNLABELLED: Bioinformatics research often requires conservative analyses of a group of sequences associated with a specific biological function (e.g. transcription factor binding sites, micro RNA target sites or protein post-translational modification sites). Due to the difficulty in exploring conserved motifs on a large-scale sequence data involved with various signals, a new method, MDDLogo, is developed. MDDLogo applies maximal dependence decomposition (MDD) to cluster a group of aligned signal sequences into subgroups containing statistically significant motifs. In order to extract motifs that contain a conserved biochemical property of amino acids in protein sequences, the set of 20 amino acids is further categorized according to their physicochemical properties, e.g. hydrophobicity, charge or molecular size. MDDLogo has been demonstrated to accurately identify the kinase-specific substrate motifs in 1221 human phosphorylation sites associated with seven well-known kinase families from Phospho.ELM. Moreover, in a set of plant phosphorylation data-lacking kinase information, MDDLogo has been applied to help in the investigation of substrate motifs of potential kinases and in the improvement of the identification of plant phosphorylation sites with various substrate specificities. In this study, MDDLogo is comparable with another well-known motif discover tool, Motif-X. CONTACT: francis@saturn.yzu.edu.tw


Assuntos
Motivos de Aminoácidos , Análise por Conglomerados , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Processamento de Proteína Pós-Traducional , Humanos , Interações Hidrofóbicas e Hidrofílicas , Fosforilação , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Plantas/química , Plantas/metabolismo , Sinais Direcionadores de Proteínas , Especificidade por Substrato
13.
BMC Bioinformatics ; 12: 261, 2011 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-21703007

RESUMO

BACKGROUND: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding in silico prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. RESULTS: Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species Arabidopsis thaliana. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using Arabidopsis thaliana phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms. CONCLUSIONS: This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos http://csb.cse.yzu.edu.tw/PlantPhos/. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.


Assuntos
Proteínas de Arabidopsis/análise , Arabidopsis/metabolismo , Bases de Dados de Proteínas , Motivos de Aminoácidos , Cadeias de Markov , Fosforilação , Fosfotransferases/química , Estrutura Terciária de Proteína , Especificidade por Substrato
14.
BMC Bioinformatics ; 12 Suppl 13: S10, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22372765

RESUMO

BACKGROUND: Carboxylation is a modification of glutamate (Glu) residues which occurs post-translation that is catalyzed by γ-glutamyl carboxylase in the lumen of the endoplasmic reticulum. Vitamin K is a critical co-factor in the post-translational conversion of Glu residues to γ-carboxyglutamate (Gla) residues. It has been shown that the process of carboxylation is involved in the blood clotting cascade, bone growth, and extraosseous calcification. However, studies in this field have been limited by the difficulty of experimentally studying substrate site specificity in γ-glutamyl carboxylation. In silico investigations have the potential for characterizing carboxylated sites before experiments are carried out. RESULTS: Because of the importance of γ-glutamyl carboxylation in biological mechanisms, this study investigates the substrate site specificity in carboxylation sites. It considers not only the composition of amino acids that surround carboxylation sites, but also the structural characteristics of these sites, including secondary structure and solvent-accessible surface area (ASA). The explored features are used to establish a predictive model for differentiating between carboxylation sites and non-carboxylation sites. A support vector machine (SVM) is employed to establish a predictive model with various features. A five-fold cross-validation evaluation reveals that the SVM model, trained with the combined features of positional weighted matrix (PWM), amino acid composition (AAC), and ASA, yields the highest accuracy (0.892). Furthermore, an independent testing set is constructed to evaluate whether the predictive model is over-fitted to the training set. CONCLUSIONS: Independent testing data that did not undergo the cross-validation process shows that the proposed model can differentiate between carboxylation sites and non-carboxylation sites. This investigation is the first to study carboxylation sites and to develop a system for identifying them. The proposed method is a practical means of preliminary analysis and greatly diminishes the total number of potential carboxylation sites requiring further experimental confirmation.


Assuntos
Ácido 1-Carboxiglutâmico/análise , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Máquina de Vetores de Suporte , Ácido 1-Carboxiglutâmico/metabolismo , Humanos , Especificidade por Substrato , Vitamina K/química
15.
J Comput Aided Mol Des ; 25(10): 987-95, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22038416

RESUMO

In proteins, glutamate (Glu) residues are transformed into γ-carboxyglutamate (Gla) residues in a process called carboxylation. The process of protein carboxylation catalyzed by γ-glutamyl carboxylase is deemed to be important due to its involvement in biological processes such as blood clotting cascade and bone growth. There is an increasing interest within the scientific community to identify protein carboxylation sites. However, experimental identification of carboxylation sites via mass spectrometry-based methods is observed to be expensive, time-consuming, and labor-intensive. Thus, we were motivated to design a computational method for identifying protein carboxylation sites. This work aims to investigate the protein carboxylation by considering the composition of amino acids that surround modification sites. With the implication of a modified residue prefers to be accessible on the surface of a protein, the solvent-accessible surface area (ASA) around carboxylation sites is also investigated. Radial basis function network is then employed to build a predictive model using various features for identifying carboxylation sites. Based on a five-fold cross-validation evaluation, a predictive model trained using the combined features of amino acid sequence (AA20D), amino acid composition, and ASA, yields the highest accuracy at 0.874. Furthermore, an independent test done involving data not included in the cross-validation process indicates that in silico identification is a feasible means of preliminary analysis. Additionally, the predictive method presented in this work is implemented as Carboxylator ( http://csb.cse.yzu.edu.tw/Carboxylator/ ), a web-based tool for identifying carboxylated proteins with modification sites in order to help users in investigating γ-glutamyl carboxylation.


Assuntos
Ácido 1-Carboxiglutâmico/química , Carbono-Carbono Ligases/química , Processamento de Proteína Pós-Traducional , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Motivos de Aminoácidos , Sítios de Ligação , Simulação por Computador , Bases de Dados de Proteínas
16.
PLoS One ; 10(4): e0118752, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25849935

RESUMO

S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Glutationa/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Cisteína/metabolismo , Humanos , Camundongos , Dados de Sequência Molecular , Proteínas/química , Homologia de Sequência de Aminoácidos , Especificidade por Substrato , Máquina de Vetores de Suporte , Tiorredoxinas/química , Tiorredoxinas/metabolismo
17.
Biomed Res Int ; 2014: 528650, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25147802

RESUMO

Lysine acetylation is an important and ubiquitous posttranslational modification conserved in prokaryotes and eukaryotes. This process, which is dynamically and temporally regulated by histone acetyltransferases and deacetylases, is crucial for numerous essential biological processes such as transcriptional regulation, cellular signaling, and stress response. Since the experimental identification of lysine acetylation sites within proteins is time-consuming and laboratory-intensive, several computational approaches have been developed to identify candidates for experimental validation. In this work, acetylated protein data collected from UniProtKB were categorized into histone or nonhistone proteins. Support vector machines (SVMs) were applied to build predictive models by using amino acid pair composition (AAPC) as a feature in a histone model. We combined BLOSUM62 and AAPC features in a nonhistone model. Furthermore, using maximal dependence decomposition (MDD) clustering can enhance the performance of the model on a fivefold cross-validation evaluation to yield a sensitivity of 0.863, specificity of 0.885, accuracy of 0.880, and MCC of 0.706. Additionally, the proposed method is evaluated using independent test sets resulting in a predictive accuracy of 74%. This indicates that the performance of our method is comparable with that of other acetylation prediction methods.


Assuntos
Biologia Computacional/métodos , Histonas/genética , Lisina/genética , Proteínas/genética , Acetilação , Aminoácidos/genética , Bases de Dados de Proteínas , Máquina de Vetores de Suporte
18.
Database (Oxford) ; 2014(0): bau034, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24771658

RESUMO

Protein phosphorylation catalyzed by kinases plays crucial roles in regulating a variety of intracellular processes. Owing to an increasing number of in vivo phosphorylation sites that have been identified by mass spectrometry (MS)-based proteomics, the RegPhos, available online at http://csb.cse.yzu.edu.tw/RegPhos2/, was developed to explore protein phosphorylation networks in human. In this update, we not only enhance the data content in human but also investigate kinase-substrate phosphorylation networks in mouse and rat. The experimentally validated phosphorylation sites as well as their catalytic kinases were extracted from public resources, and MS/MS phosphopeptides were manually curated from research articles. RegPhos 2.0 aims to provide a more comprehensive view of intracellular signaling networks by integrating the information of metabolic pathways and protein-protein interactions. A case study shows that analyzing the phosphoproteome profile of time-dependent cell activation obtained from Liquid chromatography-mass spectrometry (LC-MS/MS) analysis, the RegPhos deciphered not only the consistent scheme in B cell receptor (BCR) signaling pathway but also novel regulatory molecules that may involve in it. With an attempt to help users efficiently identify the candidate biomarkers in cancers, 30 microarray experiments, including 39 cancerous versus normal cells, were analyzed for detecting cancer-specific expressed genes coding for kinases and their substrates. Furthermore, this update features an improved web interface to facilitate convenient access to the exploration of phosphorylation networks for a group of genes/proteins. Database URL: http://csb.cse.yzu.edu.tw/RegPhos2/


Assuntos
Bases de Dados de Proteínas , Fosfoproteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas Quinases , Transdução de Sinais , Perfilação da Expressão Gênica , Humanos , Fosfoproteínas/química , Fosfoproteínas/metabolismo , Fosforilação , Proteínas Quinases/química , Proteínas Quinases/metabolismo
19.
PLoS One ; 7(7): e40694, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22844408

RESUMO

Viruses infect humans and progress inside the body leading to various diseases and complications. The phosphorylation of viral proteins catalyzed by host kinases plays crucial regulatory roles in enhancing replication and inhibition of normal host-cell functions. Due to its biological importance, there is a desire to identify the protein phosphorylation sites on human viruses. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in human viruses do not include the investigation of the responsible kinases. Thus, we are motivated to propose a new method to identify protein phosphorylation sites with its kinase substrate specificity on human viruses. The experimentally verified phosphorylation data were extracted from virPTM--a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate kinase substrate specificities in viral protein phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. The experimental human phosphorylation sites are collected from Phospho.ELM, grouped according to its kinase annotation, and compared with the virus MDD clusters. This investigation identifies human kinases such as CK2, PKB, CDK, and MAPK as potential kinases for catalyzing virus protein substrates as confirmed by published literature. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A five-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent testing data collected from UniProtKB and Phospho.ELM is used to make a comparison of predictive performance on three popular kinase-specific phosphorylation site prediction tools. In the independent testing, the high sensitivity and specificity of the proposed method demonstrate the predictive effectiveness of the identified substrate motifs and the importance of investigating potential kinases for viral protein phosphorylation sites.


Assuntos
Biologia Computacional , Proteínas Quinases/metabolismo , Proteínas Virais/química , Proteínas Virais/metabolismo , Motivos de Aminoácidos , Sítios de Ligação , Humanos , Fosforilação , Reprodutibilidade dos Testes , Especificidade por Substrato
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA