Pesquisa | Prevenção e Controle de Câncer

Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.

Kumaran Nair, Smitha Sunil; Subba Reddy, N V; Hareesha, K S.

Protein Pept Lett ; 19(9): 917-23, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22486618

RESUMO

It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically-inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.

Assuntos

Aminoácidos/química , Amiloide/química , Inteligência Artificial , Peptídeos/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Árvores de Decisões , Humanos , Redes Neurais de Computação , Análise de Componente Principal , Estrutura Terciária de Proteína , Máquina de Vetores de Suporte

Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic.

Nair, Smitha Sunil Kumaran; Subba Reddy, N V; Hareesha, K S.

BMC Bioinformatics ; 12 Suppl 13: S21, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22373069

RESUMO

BACKGROUND: Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions. RESULTS: Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve. CONCLUSION: A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status - amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.

Assuntos

Amiloide/química , Redes Neurais de Computação , Peptídeos/química , Análise de Sequência de Proteína/métodos , Máquina de Vetores de Suporte , Algoritmos , Bases de Dados de Proteínas , Dobramento de Proteína , Estrutura Terciária de Proteína , Curva ROC , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA