Putative genome contamination has minimal impact on the GTDB taxonomy.
Microb Genom
; 10(5)2024 May.
Article
em En
| MEDLINE
| ID: mdl-38809778
ABSTRACT
The Genome Taxonomy Database (GTDB) provides a species to domain classification of publicly available genomes based on average nucleotide identity (ANI) (for species) and a concatenated gene phylogeny normalized by evolutionary rates (for genus to phylum), which has been widely adopted by the scientific community. Here, we use the Genome UNClutterer (GUNC) software to identify putatively contaminated genomes in GTDB release 07-RS207. We found that GUNC reported 35,723 genomes as putatively contaminated, comprising 11.25â% of the 317,542 genomes in GTDB release 07-RS207. To assess the impact of this high level of inferred contamination on the delineation of taxa, we created 'clean' versions of the 34,846 putatively contaminated bacterial genomes by removing the most contaminated half. For each clean half, we re-calculated the ANI and concatenated gene phylogeny and found that only 77 (0.22â%) of the genomes were not consistent with their original classification. We conclude that the delineation of taxa in GTDB is robust to the putative contamination detected by GUNC.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Filogenia
/
Bactérias
/
Genoma Bacteriano
Idioma:
En
Revista:
Microb Genom
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
Austrália
País de publicação:
Reino Unido