Your browser doesn't support javascript.
loading
Putative genome contamination has minimal impact on the GTDB taxonomy.
Mussig, Aaron J; Chaumeil, Pierre-Alain; Chuvochina, Maria; Rinke, Christian; Parks, Donovan H; Hugenholtz, Philip.
Afiliação
  • Mussig AJ; The University of Queensland, School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, St Lucia, QLD, Australia.
  • Chaumeil PA; The University of Queensland, School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, St Lucia, QLD, Australia.
  • Chuvochina M; The University of Queensland, School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, St Lucia, QLD, Australia.
  • Rinke C; The University of Queensland, School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, St Lucia, QLD, Australia.
  • Parks DH; Present address: Department of Microbiology, University of Innsbruck, Innsbruck, Austria.
  • Hugenholtz P; The University of Queensland, School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, St Lucia, QLD, Australia.
Microb Genom ; 10(5)2024 May.
Article em En | MEDLINE | ID: mdl-38809778
ABSTRACT
The Genome Taxonomy Database (GTDB) provides a species to domain classification of publicly available genomes based on average nucleotide identity (ANI) (for species) and a concatenated gene phylogeny normalized by evolutionary rates (for genus to phylum), which has been widely adopted by the scientific community. Here, we use the Genome UNClutterer (GUNC) software to identify putatively contaminated genomes in GTDB release 07-RS207. We found that GUNC reported 35,723 genomes as putatively contaminated, comprising 11.25 % of the 317,542 genomes in GTDB release 07-RS207. To assess the impact of this high level of inferred contamination on the delineation of taxa, we created 'clean' versions of the 34,846 putatively contaminated bacterial genomes by removing the most contaminated half. For each clean half, we re-calculated the ANI and concatenated gene phylogeny and found that only 77 (0.22 %) of the genomes were not consistent with their original classification. We conclude that the delineation of taxa in GTDB is robust to the putative contamination detected by GUNC.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Filogenia / Bactérias / Genoma Bacteriano Idioma: En Revista: Microb Genom Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Austrália País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Filogenia / Bactérias / Genoma Bacteriano Idioma: En Revista: Microb Genom Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Austrália País de publicação: Reino Unido