Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
J Theor Biol ; 343: 186-92, 2014 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-24189096

RESUMO

DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Fractais , Máquina de Vetores de Suporte , Proteínas de Ligação a DNA/química , Bases de Dados de Proteínas , Modelos Moleculares , Dinâmica não Linear , Estrutura Terciária de Proteína , Análise de Regressão , Reprodutibilidade dos Testes
2.
J Theor Biol ; 293: 74-81, 2012 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-22001320

RESUMO

Knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 degree plays a major role in helping design stable proteins. How to predict a DNA sequence to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate the patterns hiding in DNA sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert every DNA sequence into a high dimensional vector by CGR algorithm and fractal dimension, and then predict the DNA sequence thermostability by these fractal features and support vector machine (SVM). We have conducted experiments on three groups: 17-dimensional vector, 65-dimensional vector, and 257-dimensional vector. Each group is evaluated by the 10-fold cross-validation test. For the results, the group of 257-dimensional vector gets the best results: the average accuracy is 0.9456 and average MCC is 0.8878. The results are also compared with the previous work with single CGR features. The comparison shows the high effectiveness of the new hybrid fractal algorithm.


Assuntos
Algoritmos , DNA Bacteriano/genética , Temperatura Alta , Análise de Sequência de DNA/métodos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Fractais , Modelos Genéticos , Dinâmica não Linear , Máquina de Vetores de Suporte
3.
Protein Pept Lett ; 19(9): 940-8, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22486614

RESUMO

Obtaining soluble proteins in sufficient concentrations is a major obstacle in various experimental studies. How to predict the propensity of targets in large-scale proteomics projects to be soluble is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) can investigate the patterns hiding in protein sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert each protein sequence into a high-dimensional vector by CGR algorithm and fractal dimension, and then predict protein solubility by these fractal features together with Chou's pseudo amino acid composition features and support vector machine (SVM). We extract and study six groups of features computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test. As the results of comparisons, the group of 445-dimensional vector gets the best results, the average accuracy is 0.8741 and average MCC is 0.7358. The resulting predictor is also compared with existing methods and shows significant improvement.


Assuntos
Fractais , Dinâmica não Linear , Proteínas/química , Aminoácidos/química , Modelos Químicos , Solubilidade , Máquina de Vetores de Suporte
4.
Protein Pept Lett ; 18(12): 1244-50, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21787282

RESUMO

Comprehensive knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 °C degree plays a major role for helping to design stable proteins. How to predict function-unknown proteins to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate hidden patterns in protein sequences, and also can visually reveal their previously unknown structures. In this paper, using the general form of pseudo amino acid composition to represent protein samples, we proposed a novel method for presenting protein sequence to a CGR picture using CGR algorithm. A 24-dimensional vector extracted from these CGR segments and the first two PCA features are used to classify thermophilic and mesophilic proteins by Support Vector Machine (SVM). Our method is evaluated by the jackknife test. For the 24-dimensional vector, the accuracy is 0.8792 and Matthews Correlation Coefficient (MCC) is 0.7587. The 26-dimensional vector by hybridizing with PCA components performs highly satisfaction, in which the accuracy achieves 0.9944 and MCC achieves 0.9888. The results show the effectiveness of the new hybrid method.


Assuntos
Aminoácidos/química , Análise de Componente Principal/métodos , Proteínas/química , Algoritmos , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA