Your browser doesn't support javascript.
loading
Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure.
Pronk, Lotte J U; Medema, Marnix H.
Afiliación
  • Pronk LJU; Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
  • Medema MH; Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
Microb Genom ; 8(5)2022 05.
Article en En | MEDLINE | ID: mdl-35503723
ABSTRACT
Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https//github.com/LottePronk/whokaryote.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Metagenoma / Microbiota Tipo de estudio: Prognostic_studies / Screening_studies Idioma: En Revista: Microb Genom Año: 2022 Tipo del documento: Article País de afiliación: Países Bajos

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Metagenoma / Microbiota Tipo de estudio: Prognostic_studies / Screening_studies Idioma: En Revista: Microb Genom Año: 2022 Tipo del documento: Article País de afiliación: Países Bajos