Your browser doesn't support javascript.
loading
Hitac: a hierarchical taxonomic classifier for fungal ITS sequences compatible with QIIME2.
Miranda, Fábio M; Azevedo, Vasco C; Ramos, Rommel J; Renard, Bernhard Y; Piro, Vitor C.
Afiliación
  • Miranda FM; Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
  • Azevedo VC; Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany.
  • Ramos RJ; Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.
  • Renard BY; Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.
  • Piro VC; Institute of Biological Sciences, Federal University of Pará, Belém, Brazil.
BMC Bioinformatics ; 25(1): 228, 2024 Jul 02.
Article en En | MEDLINE | ID: mdl-38956506
ABSTRACT

BACKGROUND:

Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors.

RESULTS:

Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI.

CONCLUSIONS:

HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https//gitlab.com/dacs-hpi/hitac .
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Automático / Hongos Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Alemania

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Automático / Hongos Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Alemania