Your browser doesn't support javascript.
loading
Speeding genomic island discovery through systematic design of reference database composition.
Yu, Steven L; Mageeney, Catherine M; Shormin, Fatema; Ghaffari, Noushin; Williams, Kelly P.
Affiliation
  • Yu SL; Sandia National Labs, Livermore, California, United States of America.
  • Mageeney CM; Sandia National Labs, Livermore, California, United States of America.
  • Shormin F; Department of Computer Science, Roy G. Perry College of Engineering, Prairie View A&M University, Prairie View, Texas, United States of America.
  • Ghaffari N; Department of Computer Science, Roy G. Perry College of Engineering, Prairie View A&M University, Prairie View, Texas, United States of America.
  • Williams KP; Sandia National Labs, Livermore, California, United States of America.
PLoS One ; 19(3): e0298641, 2024.
Article in En | MEDLINE | ID: mdl-38478526
ABSTRACT

BACKGROUND:

Genomic islands (GIs) are mobile genetic elements that integrate site-specifically into bacterial chromosomes, bearing genes that affect phenotypes such as pathogenicity and metabolism. GIs typically occur sporadically among related bacterial strains, enabling comparative genomic approaches to GI identification. For a candidate GI in a query genome, the number of reference genomes with a precise deletion of the GI serves as a support value for the GI. Our comparative software for GI identification was slowed by our original use of large reference genome databases (DBs). Here we explore smaller species-focused DBs.

RESULTS:

With increasing DB size, recovery of our reliable prophage GI calls reached a plateau, while recovery of less reliable GI calls (FPs) increased rapidly as DB sizes exceeded ~500 genomes; i.e., overlarge DBs can increase FP rates. Paradoxically, relative to prophages, FPs were both more frequently supported only by genomes outside the species and more frequently supported only by genomes inside the species; this may be due to their generally lower support values. Setting a DB size limit for our SMAll Ranked Tailored (SMART) DB design speeded runtime ~65-fold. Strictly intra-species DBs would tend to lower yields of prophages for small species (with few genomes available); simulations with large species showed that this could be partially overcome by reaching outside the species to closely related taxa, without an FP burden. Employing such taxonomic outreach in DB design generated redundancy in the DB set; as few as 2984 DBs were needed to cover all 47894 prokaryotic species.

CONCLUSIONS:

Runtime decreased dramatically with SMART DB design, with only minor losses of prophages. We also describe potential utility in other comparative genomics projects.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Bacterial / Genomic Islands Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2024 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Bacterial / Genomic Islands Language: En Journal: PLoS One Journal subject: CIENCIA / MEDICINA Year: 2024 Document type: Article Affiliation country: