Pesquisa | Portal Regional da BVS

Hit integration for identifying optimal spaced seeds.

Chung, Won-Hyoung; Park, Seong-Bae.

BMC Bioinformatics ; 11 Suppl 1: S37, 2010 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-20122210

RESUMO

BACKGROUND: Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, the limitation of hit probability is that it computes the sensitivity only at a specific similarity level while homologous regions usually distributed in various similarity levels. As a result, the optimal seed found by hit probability is not actually optimal for various similarity levels. Therefore, a new measure of seed sensitivity is required to recommend seeds that are robust to various similarity levels. RESULTS: We propose a new probability model of sensitivity hit integration which covers a range of similarity levels of homologous regions. A novel algorithm of computing hit integration is proposed which is based on integration of hit probabilities at a range of similarity levels. We also prove that hit integration is computable by expressing the integral part of hit integration as a recursive formula which can be easily solved by dynamic programming. The experimental results for biological data show that hit integration reveals the seeds more optimal than those by PatternHunter. CONCLUSION: The presented model is a more general model to estimate sensitivity than hit probability by relaxing similarity level. We propose a novel algorithm which directly computes the sensitivity at a range of similarity levels.

Assuntos

Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Probabilidade

An empirical study of choosing efficient discriminative seeds for oligonucleotide design.

Chung, Won-Hyoung; Park, Seong-Bae.

BMC Genomics ; 10 Suppl 3: S3, 2009 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-19958494

RESUMO

BACKGROUND: Oligonucleotide design is known as a time-consuming work in bioinformatics. In order to accelerate and be efficient the oligonucleotide design process, one of widely used approach is the prescreening unreliable regions using a hashing (or seeding) algorithm. Since the seeding algorithm is originally proposed to increase sensitivity for local alignment, the specificity should be considered as well as the sensitivity for the oligonucleotide design problem. However, a measure of evaluating the seeds regarding how adequate and efficient they are in the oligo design is not yet proposed. Here, we propose novel measures of evaluating the seeding algorithms based on the discriminability and the efficiency. RESULTS: To evaluate the proposed measures, we examine five seeding algorithms in oligonucleotide design. We carried out a series of experiments to compare the seeding algorithms. As the result, the spaced seed is recorded as the most efficient discriminative seed for oligo design. The performance of transition-constrained seed is slightly lower than the spaced seed. Because BLAT seeding algorithm and Vector seeding algorithm give poor scores in specificity and efficiency, we conclude that these algorithms are not adequate to design oligos. Consequently, we recommend spaced seeds or transition-constrained seeds with 15 approximately 18 weight in order to design oligos with the length of 50 mer. The empirical experiments in real biological data reveal that the recommended seeds show consequently good performance. We also propose a software package which enables the users to get the adequate seeds under their own experimental conditions. CONCLUSION: Our study is valuable to the two points. One is that our study can be applied to the oligo design programs in order to improve the performance by suggesting the experiment-specific seeds. The other is that our study is useful to improve the performance of the mapping assembly in the field of Next-Generation Sequencing. Our proposed measures are originally designed to be used for oligo design but we expect that our study will be helpful to the other genomic tasks.

Assuntos

Pesquisa Empírica , Oligonucleotídeos/análise , Análise de Sequência de DNA/métodos , Algoritmos , Análise Discriminante , Genômica , Humanos , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA