Pesquisa | BVS - MINISTÉRIO DA SAÚDE

A novel sequence and context based method for promoter recognition.

P, Umesh; Dubey, Jitendra Kumar; Rv, Karthika; Cherian, Betsy Sheena; Gopalakrishnan, Gopakumar; Nair, Achuthsankar Sukumaran.

Bioinformation ; 10(4): 175-9, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24966516

RESUMO

UNLABELLED: Identification of promoters in DNA sequence using computational techniques is a significant research area because of its direct association in transcription regulation. A wide range of algorithms are available for promoter prediction. Most of them are polymerase dependent and cannot handle eukaryotes and prokaryotes alike. This study proposes a polymerase independent algorithm, which can predict whether a given DNA fragment is a promoter or not, based on the sequence features and statistical elements. This algorithm considers all possible pentamers formed from the nucleotides A, C, G, and T along with CpG islands, TATA box, initiator elements, and downstream promoter elements. The highlight of the algorithm is that it is not polymerase specific and can predict for both eukaryotes and prokaryotes in the same computational manner even though the underlying biological mechanisms of promoter recognition differ greatly. The proposed Method, Promoter Prediction System - PPS-CBM achieved a sensitivity, specificity, and accuracy percentages of 75.08, 83.58 and 79.33 on E. coli data set and 86.67, 88.41 and 87.58 on human data set. We have developed a tool based on PPS-CBM, the proposed algorithm, with which multiple sequences of varying lengths can be tested simultaneously and the result is reported in a comprehensive tabular format. The tool also reports the strength of the prediction. AVAILABILITY: The tool and source code of PPS-CBM is available at http://keralabs.org.

Protein location prediction using atomic composition and global features of the amino acid sequence.

Cherian, Betsy Sheena; Nair, Achuthsankar S.

Biochem Biophys Res Commun ; 391(4): 1670-4, 2010 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-20036215

RESUMO

Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

Assuntos

Espaço Intracelular/metabolismo , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Biologia Computacional , Proteínas/genética

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA