Pesquisa | BVS Violência e Saúde

Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs.

Chen, Yong-Zi; Tang, Yu-Rong; Sheng, Zhi-Ya; Zhang, Ziding.

BMC Bioinformatics ; 9: 101, 2008 Feb 18.

Artigo em Inglês | MEDLINE | ID: mdl-18282281

RESUMO

BACKGROUND: As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins. RESULTS: A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of k-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors. CONCLUSION: Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at http://bioinformatics.cau.edu.cn/zzd_lab/CKSAAP_OGlySite/.

Assuntos

Glicosilação , Mucinas/química , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Inteligência Artificial , Sítios de Ligação , Humanos , Mamíferos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Ligação Proteica

An improved prediction of catalytic residues in enzyme structures.

Tang, Yu-Rong; Sheng, Zhi-Ya; Chen, Yong-Zi; Zhang, Ziding.

Protein Eng Des Sel ; 21(5): 295-302, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18287176

RESUMO

The protein databases contain a huge number of function unknown proteins, including many proteins with newly determined 3D structures resulted from the Structural Genomics Projects. To accelerate experiment-based assignment of function, de novo prediction of protein functional sites, like active sites in enzymes, becomes increasingly important. Here, we attempted to improve the prediction of catalytic residues in enzyme structures by seeking and refining different encodings (i.e. residue properties) as well as employing new machine learning algorithms. In particular, considering that catalytic residues can often reveal specific network centrality when representing enzyme structure as a residue contact network, the corresponding measurement (i.e. closeness centrality) was used as one of the most important encodings in our new predictor. Meanwhile, a genetic algorithm integrated neural network (GANN) was also employed. Thanks to the above strategies, our GANN predictor demonstrated a high accuracy of 91.2% in the prediction of catalytic residues based on balanced datasets (i.e. the 1:1 ratio of catalytic to non-catalytic residues). When the GANN method was optimally applied to real enzyme structures, 73.9% of the tested structures had the active site correctly located. Compared with two existing methods, the proposed GANN method also demonstrated a better performance.

Assuntos

Enzimas/química , Algoritmos , Sítios de Ligação , Catálise , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Ligação de Hidrogênio , Modelos Genéticos , Modelos Moleculares , Modelos Estatísticos , Conformação Molecular , Redes Neurais de Computação , Conformação Proteica , Análise de Sequência de Proteína

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA