RESUMO
BACKGROUND: In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70) is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70) promoter sequence motifs. In this study the σ(70)-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS: To identify σ(70)-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70)-promoter consensus. Position weight matrices-based models of the recovered σ(70)-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4)) to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs) from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides) of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263)). CONCLUSIONS: High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and allow experimental verification of in silico predicted promoter sequence motifs.