PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing.

Goussarov, Gleb; Cleenwerck, Ilse; Mysara, Mohamed; Leys, Natalie; Monsieurs, Pieter; Tahon, Guillaume; Carlier, Aurélien; Vandamme, Peter; Van Houdt, Rob

Goussarov, Gleb; Cleenwerck, Ilse; Mysara, Mohamed; Leys, Natalie; Monsieurs, Pieter; Tahon, Guillaume; Carlier, Aurélien; Vandamme, Peter; Van Houdt, Rob.

Afiliação

Goussarov G; Microbiology Unit, Belgian Nuclear Research Centre (SCKâ¢CEN), Mol, Belgium.
Cleenwerck I; Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
Mysara M; Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
Leys N; Microbiology Unit, Belgian Nuclear Research Centre (SCKâ¢CEN), Mol, Belgium.
Monsieurs P; Microbiology Unit, Belgian Nuclear Research Centre (SCKâ¢CEN), Mol, Belgium.
Tahon G; Microbiology Unit, Belgian Nuclear Research Centre (SCKâ¢CEN), Mol, Belgium.
Carlier A; Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
Vandamme P; Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
Van Houdt R; LIPM, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan, France.

Bioinformatics ; 36(8): 2337-2344, 2020 04 15.

Article em En | MEDLINE | ID: mdl-31899493

ABSTRACT

ABSTRACT

MOTIVATION One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances.

RESULTS:

Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses. AVAILABILITY AND IMPLEMENTATION The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https//github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Genômica; Software; Bactérias/genética; Genoma Bacteriano; Oligonucleotídeos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Genômica Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google