Using random forest algorithm to predict ß-hairpin motifs.
Protein Pept Lett
; 18(6): 609-17, 2011 Jun.
Article
em En
| MEDLINE
| ID: mdl-21309739
ABSTRACT
A novel method is presented for predicting ß-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 ß-hairpin motifs and 6,865 non-ß-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 ß-hairpin motifs and 4,310 non-ß-hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 ß-hairpin motifs and 4,310 non-ß-hairpin motifs selected as the training set, and 8,291 ß-hairpin motifs and 6,865 non-ß-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.
Buscar no Google
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Proteínas
/
Biologia Computacional
/
Motivos de Aminoácidos
Tipo de estudo:
Clinical_trials
/
Prognostic_studies
/
Risk_factors_studies
Limite:
Humans
Idioma:
En
Revista:
Protein Pept Lett
Assunto da revista:
BIOQUIMICA
Ano de publicação:
2011
Tipo de documento:
Article