Efficient motif finding algorithms for large-alphabet inputs.
BMC Bioinformatics
; 11 Suppl 8: S1, 2010 Oct 26.
Article
em En
| MEDLINE
| ID: mdl-21034426
ABSTRACT
BACKGROUND:
We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings.RESULTS:
The proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA) we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families.CONCLUSIONS:
Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Sítios de Ligação
/
Algoritmos
/
Reconhecimento Automatizado de Padrão
/
Análise de Sequência de DNA
/
Biologia Computacional
/
Análise de Sequência de Proteína
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
Idioma:
En
Ano de publicação:
2010
Tipo de documento:
Article