Predicting DNA binding proteins using support vector machine with hybrid fractal features.
J Theor Biol
; 343: 186-92, 2014 Feb 21.
Article
em En
| MEDLINE
| ID: mdl-24189096
DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Fractais
/
Biologia Computacional
/
Proteínas de Ligação a DNA
/
Máquina de Vetores de Suporte
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
J Theor Biol
Ano de publicação:
2014
Tipo de documento:
Article