Your browser doesn't support javascript.
loading
Identification of mobile genetic elements with geNomad.
Camargo, Antonio Pedro; Roux, Simon; Schulz, Frederik; Babinski, Michal; Xu, Yan; Hu, Bin; Chain, Patrick S G; Nayfach, Stephen; Kyrpides, Nikos C.
Affiliation
  • Camargo AP; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. antoniop.camargo@lbl.gov.
  • Roux S; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  • Schulz F; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  • Babinski M; Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
  • Xu Y; Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
  • Hu B; Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
  • Chain PSG; Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
  • Nayfach S; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  • Kyrpides NC; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. nckyrpides@lbl.gov.
Nat Biotechnol ; 2023 Sep 21.
Article in En | MEDLINE | ID: mdl-37735266
ABSTRACT
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https//portal.nersc.gov/genomad .

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Diagnostic_studies Language: En Journal: Nat Biotechnol Journal subject: BIOTECNOLOGIA Year: 2023 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Diagnostic_studies Language: En Journal: Nat Biotechnol Journal subject: BIOTECNOLOGIA Year: 2023 Document type: Article Affiliation country: