Pesquisa | Portal Regional da BVS

Prediction of protein secondary structures with a novel kernel density estimation based classifier.

Chang, Darby Tien-Hao; Ou, Yu-Yen; Hung, Hao-Geng; Yang, Meng-Han; Chen, Chien-Yu; Oyang, Yen-Jen.

BMC Res Notes ; 1: 51, 2008 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-18710504

RESUMO

BACKGROUND: Though prediction of protein secondary structures has been an active research issue in bioinformatics for quite a few years and many approaches have been proposed, a new challenge emerges as the sizes of contemporary protein structure databases continue to grow rapidly. The new challenge concerns how we can effectively exploit all the information implicitly deposited in the protein structure databases and deliver ever-improving prediction accuracy as the databases expand rapidly. FINDINGS: The new challenge is addressed in this article by proposing a predictor designed with a novel kernel density estimation algorithm. One main distinctive feature of the kernel density estimation based approach is that the average execution time taken by the training process is in the order of O(nlogn), where n is the number of instances in the training dataset. In the experiments reported in this article, the proposed predictor delivered an average Q3 (three-state prediction accuracy) score of 80.3% and an average SOV (segment overlap) score of 76.9% for a set of 27 benchmark protein chains extracted from the EVA server that are longer than 100 residues. CONCLUSION: The experimental results reported in this article reveal that we can continue to achieve higher prediction accuracy of protein secondary structures by effectively exploiting the structural information deposited in fast-growing protein structure databases. In this respect, the kernel density estimation based approach enjoys a distinctive advantage with its low time complexity for carrying out the training process.

Discovering gapped binding sites of yeast transcription factors.

Chen, Chien-Yu; Tsai, Huai-Kuang; Hsu, Chen-Ming; May Chen, Mei-Ju; Hung, Hao-Geng; Huang, Grace Tzu-Wei; Li, Wen-Hsiung.

Proc Natl Acad Sci U S A ; 105(7): 2527-32, 2008 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-18272477

RESUMO

A gapped transcription factor-binding site (TFBS) contains one or more highly degenerate positions. Discovering gapped motifs is difficult, because allowing highly degenerate positions in a motif greatly enlarges the search space and complicates the discovery process. Here, we propose a method for discovering TFBSs, especially gapped motifs. We use ChIP-chip data to judge the binding strength of a TF to a putative target promoter and use orthologous sequences from related species to judge the degree of evolutionary conservation of a predicted TFBS. Candidate motifs are constructed by growing compact motif blocks and by concatenating two candidate blocks, allowing 0-15 degenerate positions in between. The resultant patterns are statistically evaluated for their ability to distinguish between target and nontarget genes. Then, a position-based ranking procedure is proposed to enhance the signals of true motifs by collecting position concurrences. Empirical tests on 32 known yeast TFBSs show that the method is highly accurate in identifying gapped motifs, outperforming current methods, and it also works well on ungapped motifs. Predictions on additional 54 TFs successfully discover 11 gapped and 38 ungapped motifs supported by literature. Our method achieves high sensitivity and specificity for predicting experimentally verified TFBSs.

Assuntos

Fatores de Transcrição/metabolismo , Leveduras/genética , Leveduras/metabolismo , Sítios de Ligação , Filogenia , Fatores de Transcrição/genética

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA