Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.
Nucleic Acids Res
; 39(21): e146, 2011 Nov.
Article
em En
| MEDLINE
| ID: mdl-21948794
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Fatores de Transcrição
/
Algoritmos
/
Análise de Sequência de DNA
/
Recombinases
/
Motivos de Nucleotídeos
Tipo de estudo:
Evaluation_studies
/
Prognostic_studies
Idioma:
En
Revista:
Nucleic Acids Res
Ano de publicação:
2011
Tipo de documento:
Article
País de afiliação:
Estados Unidos