Your browser doesn't support javascript.
loading
CDBProm: the Comprehensive Directory of Bacterial Promoters.
Martinez, Gustavo Sganzerla; Perez-Rueda, Ernesto; Kumar, Anuj; Dutt, Mansi; Maya, Cinthia Rodríguez; Ledesma-Dominguez, Leonardo; Casa, Pedro Lenz; Kumar, Aditya; de Avila E Silva, Scheila; Kelvin, David J.
Afiliação
  • Martinez GS; Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.
  • Perez-Rueda E; Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada.
  • Kumar A; BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada.
  • Dutt M; Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autonóma de México, Unidad Académica del Estado de Yucatán, Mérida 97302, Yucatán, Mexico.
  • Maya CR; Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.
  • Ledesma-Dominguez L; Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada.
  • Casa PL; BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada.
  • Kumar A; Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.
  • de Avila E Silva S; Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada.
  • Kelvin DJ; BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada.
NAR Genom Bioinform ; 6(1): lqae018, 2024 Mar.
Article em En | MEDLINE | ID: mdl-38385146
ABSTRACT
The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https//aw.iimas.unam.mx/cdbprom/.

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: NAR Genom Bioinform Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: NAR Genom Bioinform Ano de publicação: 2024 Tipo de documento: Article