Búsqueda | Portal de Búsqueda de la BVS España

Combining features in a graphical model to predict protein binding sites.

Wierschin, Torsten; Wang, Keyu; Welter, Marlon; Waack, Stephan; Stanke, Mario.

Proteins ; 83(5): 844-52, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-25663045

RESUMEN

Large efforts have been made in classifying residues as binding sites in proteins using machine learning methods. The prediction task can be translated into the computational challenge of assigning each residue the label binding site or non-binding site. Observational data comes from various possibly highly correlated sources. It includes the structure of the protein but not the structure of the complex. The model class of conditional random fields (CRFs) has previously successfully been used for protein binding site prediction. Here, a new CRF-approach is presented that models the dependencies of residues using a general graphical structure defined as a neighborhood graph and thus our model makes fewer independence assumptions on the labels than sequential labeling approaches. A novel node feature "change in free energy" is introduced into the model, which is then denoted by ΔF-CRF. Parameters are trained with an online large-margin algorithm. Using the standard feature class relative accessible surface area alone, the general graph-structure CRF already achieves higher prediction accuracy than the linear chain CRF of Li et al. ΔF-CRF performs significantly better on a large range of false positive rates than the support-vector-machine-based program PresCont of Zellner et al. on a homodimer set containing 128 chains. ΔF-CRF has a broader scope than PresCont since it is not constrained to protein subgroups and requires no multiple sequence alignment. The improvement is attributed to the advantageous combination of the novel node feature with the standard feature and to the adopted parameter training method.

Asunto(s)

Programas Informáticos , Sitios de Unión , Simulación por Computador , Modelos Moleculares , Dominios y Motivos de Interacción de Proteínas , Proteínas/química , Curva ROC , Máquina de Vectores de Soporte , Termodinámica

CRF-based models of protein surfaces improve protein-protein interaction site predictions.

Dong, Zhijie; Wang, Keyu; Dang, Truong Khanh Linh; Gültas, Mehmet; Welter, Marlon; Wierschin, Torsten; Stanke, Mario; Waack, Stephan.

BMC Bioinformatics ; 15: 277, 2014 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-25124108

RESUMEN

BACKGROUND: The identification of protein-protein interaction sites is a computationally challenging task and important for understanding the biology of protein complexes. There is a rich literature in this field. A broad class of approaches assign to each candidate residue a real-valued score that measures how likely it is that the residue belongs to the interface. The prediction is obtained by thresholding this score.Some probabilistic models classify the residues on the basis of the posterior probabilities. In this paper, we introduce pairwise conditional random fields (pCRFs) in which edges are not restricted to the backbone as in the case of linear-chain CRFs utilized by Li et al. (2007). In fact, any 3D-neighborhood relation can be modeled. On grounds of a generalized Viterbi inference algorithm and a piecewise training process for pCRFs, we demonstrate how to utilize pCRFs to enhance a given residue-wise score-based protein-protein interface predictor on the surface of the protein under study. The features of the pCRF are solely based on the interface predictions scores of the predictor the performance of which shall be improved. RESULTS: We performed three sets of experiments with synthetic scores assigned to the surface residues of proteins taken from the data set PlaneDimers compiled by Zellner et al. (2011), from the list published by Keskin et al. (2004) and from the very recent data set due to Cukuroglu et al. (2014). That way we demonstrated that our pCRF-based enhancer is effective given the interface residue score distribution and the non-interface residue score are unimodal.Moreover, the pCRF-based enhancer is also successfully applicable, if the distributions are only unimodal over a certain sub-domain. The improvement is then restricted to that domain. Thus we were able to improve the prediction of the PresCont server devised by Zellner et al. (2011) on PlaneDimers. CONCLUSIONS: Our results strongly suggest that pCRFs form a methodological framework to improve residue-wise score-based protein-protein interface predictors given the scores are appropriately distributed. A prototypical implementation of our method is accessible at http://ppicrf.informatik.uni-goettingen.de/index.html.

Asunto(s)

Biología Computacional/métodos , Modelos Estadísticos , Proteínas/química , Proteínas/metabolismo , Algoritmos , Sitios de Unión , Unión Proteica , Programas Informáticos , Propiedades de Superficie

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA