Your browser doesn't support javascript.
loading
Protein fold recognition using the gradient boost algorithm.
Jiao, Feng; Xu, Jinbo; Yu, Libo; Schuurmans, Dale.
Affiliation
  • Jiao F; Alberta Ingenuity Centre for Machine Learning, University of Alberta, Alberta, Canada. fjiao@cs.uwaterloo.ca
Article in En | MEDLINE | ID: mdl-17369624
Protein structure prediction is one of the most important and difficult problems in computational molecular biology. Protein threading represents one of the most promising techniques for this problem. One of the critical steps in protein threading, called fold recognition, is to choose the best-fit template for the query protein with the structure to be predicted. The standard method for template selection is to rank candidates according to the z-score of the sequence-template alignment. However, the z-score calculation is time-consuming, which greatly hinders structure prediction at a genome scale. In this paper, we present a machine learning approach that treats the fold recognition problem as a regression task and uses a least-squares boosting algorithm (LS_Boost) to solve it efficiently. We test our method on Lindahl's benchmark and compare it with other methods. According to our experimental results we can draw the conclusions that: (1) Machine learning techniques offer an effective way to solve the fold recognition problem. (2) Formulating protein fold recognition as a regression rather than a classification problem leads to a more effective outcome. (3) Importantly, the LS_Boost algorithm does not require the calculation of the z-score as an input, and therefore can obtain significant computational savings over standard approaches. (4) The LS_Boost algorithm obtains superior accuracy, with less computation for both training and testing, than alternative machine learning approaches such as SVMs and neural networks, which also need not calculate the z-score. Finally, by using the LS_Boost algorithm, one can identify important features in the fold recognition protocol, something that cannot be done using a straightforward SVM approach.
Subject(s)
Search on Google
Collection: 01-internacional Database: MEDLINE Main subject: Proteins / Protein Folding / Computational Biology / Proteomics Type of study: Diagnostic_studies / Prognostic_studies Language: En Journal: Comput Syst Bioinformatics Conf Journal subject: INFORMATICA MEDICA Year: 2006 Document type: Article Affiliation country: Canada Country of publication: United States
Search on Google
Collection: 01-internacional Database: MEDLINE Main subject: Proteins / Protein Folding / Computational Biology / Proteomics Type of study: Diagnostic_studies / Prognostic_studies Language: En Journal: Comput Syst Bioinformatics Conf Journal subject: INFORMATICA MEDICA Year: 2006 Document type: Article Affiliation country: Canada Country of publication: United States