Your browser doesn't support javascript.
loading
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach.
Belka, Ariane; Fischer, Mareike; Pohlmann, Anne; Beer, Martin; Höper, Dirk.
Afiliación
  • Belka A; Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Südufer 10, D-17493 Greifswald, Insel Riems, Germany.
  • Fischer M; Institute for Mathematics & Computer Science, Ernst-Moritz-Arndt University, Walther-Rathenau-Straße 47, D-17489 Greifswald, Germany.
  • Pohlmann A; Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Südufer 10, D-17493 Greifswald, Insel Riems, Germany.
  • Beer M; Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Südufer 10, D-17493 Greifswald, Insel Riems, Germany.
  • Höper D; Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Südufer 10, D-17493 Greifswald, Insel Riems, Germany. Electronic address: dirk.hoeper@fli.de.
Virus Res ; 258: 55-63, 2018 10 15.
Article en En | MEDLINE | ID: mdl-30291874
ABSTRACT
Unbiased sequencing is an upcoming method to gain information of the microbiome in a sample and for the detection of unrecognized pathogens. There are many software tools for a taxonomic classification of such metagenomics datasets available. Numerous of them have a satisfactory sensitivity and specificity for known organisms, but they fail if the sample contains unknown organisms, which cannot be detected by similarity-based classification employing available databases. However, recognition of unknowns is especially important for the detection of newly emerging pathogens, which are often RNA viruses. Here we present the composition-based analysis tool LVQ-KNN for binning unclassified nucleotide sequence reads into their provenance classes DNA or RNA. With a 5-fold cross-validation, LVQ-KNN reached correct classification rates (CCR) of up to 99.9% for the classification into DNA/RNA. Real datasets gained CCRs of up to 94.5%. Comparing the method to another composition-based analysis tool, similar or better classification results were reached. LVQ-KNN is a new tool for DNA/RNA classification of sequence reads from unbiased sequencing approaches that could be applicable for the detection of yet unknown RNA viruses in metagenomic samples. The source-code, training and test data for LVQ-KNN is available at Github (https//github.com/ab1989/LVQ-KNN).
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Oligonucleótidos / Programas Informáticos / Metagenómica Tipo de estudio: Diagnostic_studies Idioma: En Año: 2018 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Oligonucleótidos / Programas Informáticos / Metagenómica Tipo de estudio: Diagnostic_studies Idioma: En Año: 2018 Tipo del documento: Article