Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Microb Genom ; 10(5)2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38809778

RESUMO

The Genome Taxonomy Database (GTDB) provides a species to domain classification of publicly available genomes based on average nucleotide identity (ANI) (for species) and a concatenated gene phylogeny normalized by evolutionary rates (for genus to phylum), which has been widely adopted by the scientific community. Here, we use the Genome UNClutterer (GUNC) software to identify putatively contaminated genomes in GTDB release 07-RS207. We found that GUNC reported 35,723 genomes as putatively contaminated, comprising 11.25 % of the 317,542 genomes in GTDB release 07-RS207. To assess the impact of this high level of inferred contamination on the delineation of taxa, we created 'clean' versions of the 34,846 putatively contaminated bacterial genomes by removing the most contaminated half. For each clean half, we re-calculated the ANI and concatenated gene phylogeny and found that only 77 (0.22 %) of the genomes were not consistent with their original classification. We conclude that the delineation of taxa in GTDB is robust to the putative contamination detected by GUNC.


Assuntos
Bactérias , Genoma Bacteriano , Filogenia , Bactérias/genética , Bactérias/classificação , Software , Bases de Dados Genéticas , Contaminação por DNA
2.
FEMS Microbiol Lett ; 3702023 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-37480240

RESUMO

The Genome Taxonomy Database (GTDB) is a taxonomic framework that defines prokaryotic taxa as monophyletic groups in concatenated protein reference trees according to systematic criteria. This has resulted in a substantial number of changes to existing classifications (https://gtdb.ecogenomic.org). In the case of union of taxa, GTDB names were applied based on the priority of publication. The division of taxa or change in rank led to the formation of new Latin names above the rank of genus that were only made publicly available via the GTDB website without associated published taxonomic descriptions. This has sometimes led to confusion in the literature and databases. A number of the provisional GTDB names were later published in other studies, while many still lack authorships. To reduce further confusion, here we propose names and descriptions for 329 GTDB-defined prokaryotic taxa, 223 of which are suitable for validation under the International Code of Nomenclature of Prokaryotes (ICNP) and 49 under the Code of Nomenclature of Prokaryotes described from Sequence Data (SeqCode). For the latter, we designated 23 genomes as type material. An additional 57 taxa that do not currently satisfy the validation criteria of either code are proposed as Candidatus.


Assuntos
Autoria , Células Procarióticas , Bases de Dados Factuais
3.
Bioinformatics ; 38(23): 5315-5316, 2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36218463

RESUMO

SUMMARY: The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here, we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification. AVAILABILITY AND IMPLEMENTATION: GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Documentação , Software
4.
Nucleic Acids Res ; 50(D1): D785-D794, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34520557

RESUMO

The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy for prokaryotic genomes sourced from the NCBI Assembly database. GTDB R06-RS202 spans 254 090 bacterial and 4316 archaeal genomes, a 270% increase since the introduction of the GTDB in November, 2017. These genomes are organized into 45 555 bacterial and 2339 archaeal species clusters which is a 200% increase since the integration of species clusters into the GTDB in June, 2019. Here, we explore prokaryotic diversity from the perspective of the GTDB and highlight the importance of metagenome-assembled genomes in expanding available genomic representation. We also discuss improvements to the GTDB website which allow tracking of taxonomic changes, easy assessment of genome assembly quality, and identification of genomes assembled from type material or used as species representatives. Methodological updates and policy changes made since the inception of the GTDB are then described along with the procedure used to update species clusters in the GTDB. We conclude with a discussion on the use of average nucleotide identities as a pragmatic approach for delineating prokaryotic species.


Assuntos
Archaea/classificação , Bactérias/classificação , Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Software , Archaea/genética , Bactérias/genética , Sequência de Bases , Internet , Metagenoma , Filogenia , Células Procarióticas/classificação , Células Procarióticas/citologia , Células Procarióticas/metabolismo
5.
Nat Microbiol ; 6(7): 946-959, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34155373

RESUMO

The accrual of genomic data from both cultured and uncultured microorganisms provides new opportunities to develop systematic taxonomies based on evolutionary relationships. Previously, we established a bacterial taxonomy through the Genome Taxonomy Database. Here, we propose a standardized archaeal taxonomy that is derived from a 122-concatenated-protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence. The resulting archaeal taxonomy, which forms part of the Genome Taxonomy Database, is stable for a range of phylogenetic variables including marker gene selection, inference methods, corrections for rate heterogeneity and compositional bias, tree rooting scenarios and expansion of the genome database. Rank normalization is shown to robustly correct for substitution rates varying up to 30-fold using simulated datasets. Taxonomic curation follows the rules of the International Code of Nomenclature of Prokaryotes while taking into account proposals to formally recognize the rank of phylum and to use genome sequences as type material. This taxonomy is based on 2,392 archaeal genomes, 93.3% of which required one or more changes to their existing taxonomy, mainly owing to incomplete classification. We identify 16 archaeal phyla and reclassify 3 major monophyletic units from the former Euryarchaeota and one phylum that unites the Thaumarchaeota-Aigarchaeota-Crenarchaeota-Korarchaeota (TACK) superphylum into a single phylum.


Assuntos
Archaea/classificação , Bases de Dados Genéticas , Genoma Arqueal , Archaea/genética , Bases de Dados Genéticas/normas , Evolução Molecular , Genômica , Filogenia , Padrões de Referência
6.
Nat Biotechnol ; 38(9): 1098, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32887961

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

7.
Nat Biotechnol ; 38(9): 1079-1086, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32341564

RESUMO

The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for ~150,000 bacterial and archaeal genomes from domain to genus. However, almost 40% of the genomes in the Genome Taxonomy Database lack a species name. We address this limitation by using commonly accepted average nucleotide identity criteria to set bounds on species and propose species clusters that encompass all publicly available bacterial and archaeal genomes. Unlike previous average nucleotide identity studies, we chose a single representative genome to serve as the effective nomenclatural 'type' defining each species. Of the 24,706 proposed species clusters, 8,792 are based on published names. We assigned placeholder names to the remaining 15,914 species clusters to provide names to the growing number of genomes from uncultivated species. This resource provides a complete domain-to-species taxonomic framework for bacterial and archaeal genomes, which will facilitate research on uncultivated species and improve communication of scientific results.


Assuntos
Archaea/classificação , Bactérias/classificação , Filogenia , Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Hibridização de Ácido Nucleico , Reprodutibilidade dos Testes
8.
Bioinformatics ; 2019 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-31730192

RESUMO

SUMMARY: The GTDB Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB). GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10,156 bacterial and archaeal metagenome-assembled genomes. AVAILABILITY: GTDB-Tk is implemented in Python and licensed under the GNU General Public License v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...