RESUMEN
Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a novel profile-based hidden Markov model for the prediction and discrimination of TMBs. The method carefully attempts to avoid over-fitting the sparse experimental data. While our model training and scoring procedures were very similar to a recently published work, the architecture and structure-based labelling were significantly different. In particular, we introduced a new definition of beta- hairpin motifs, explicit state modelling of transmembrane strands, and a log-odds whole-protein discrimination score. The resulting method reached an overall four-state (up-, down-strand, periplasmic-, outer-loop) accuracy as high as 86%. Furthermore, accurately discriminated TMB from non-TMB proteins (45% coverage at 100% accuracy). This high precision enabled the application to 72 entirely sequenced Gram-negative bacteria. We found over 164 previously uncharacterized TMB proteins at high confidence. Database searches did not implicate any of these proteins with membranes. We challenge that the vast majority of our 164 predictions will eventually be verified experimentally. All proteome predictions and the PROFtmb prediction method are available at http://www.rostlab.org/ services/PROFtmb/.
Asunto(s)
Proteínas de la Membrana/química , Proteoma/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Cadenas de Markov , Proteínas de la Membrana/fisiología , Estructura Secundaria de Proteína , Reproducibilidad de los Resultados , Alineación de SecuenciaRESUMEN
BACKGROUND: All known genomes code for a large number of transcription factors. It is important to develop methods that will reveal how these transcription factors act on a genome wide level, that is, through what target genes they exert their function. RESULTS: We describe here a program pipeline aimed at identifying transcription factor target genes in whole genomes. Starting from a consensus binding site, represented as a weight matrix, potential sites in a pre-filtered genome are identified and then further filtered by assessing conservation of the putative site in the genome of a related species, a process called phylogenetic footprinting. CisOrtho has been successfully used to identify targets for two homeodomain transcription factors in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. CONCLUSIONS: CisOrtho will identify targets of other nematode transcription factors whose DNA binding specificity is known and can be easily adapted to search other genomes for transcription factor targets.