Your browser doesn't support javascript.
loading
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.
Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun; Borodovsky, Mark.
Afiliação
  • Lomsadze A; Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, Georgia 30332, USA.
  • Gemayel K; Gene Probe, Incorporated, Atlanta, Georgia 30324, USA.
  • Tang S; School of Computational Science and Engineering, Georgia Tech, Atlanta, Georgia 30332, USA.
  • Borodovsky M; School of Biological Sciences, Georgia Tech, Atlanta, Georgia 30332, USA.
Genome Res ; 28(7): 1079-1089, 2018 07.
Article em En | MEDLINE | ID: mdl-29773659
ABSTRACT
In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts.
Assuntos

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Células Procarióticas / Bactérias / Transcrição Gênica / Archaea / Genes Bacterianos Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Genome Res Assunto da revista: BIOLOGIA MOLECULAR / GENETICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Células Procarióticas / Bactérias / Transcrição Gênica / Archaea / Genes Bacterianos Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Genome Res Assunto da revista: BIOLOGIA MOLECULAR / GENETICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos