Your browser doesn't support javascript.
loading
To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.
Hung, Yuan-Mao; Lyu, Wei-Ni; Tsai, Ming-Lin; Liu, Chiang-Lin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Chuang, Eric Y.
Afiliación
  • Hung YM; Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.
  • Lyu WN; Institute of Biotechnology, National Taiwan University, Taipei, Taiwan.
  • Tsai ML; Department of General Surgery, Cathay General Hospital, Taipei, Taiwan.
  • Liu CL; Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.
  • Lai LC; Bioinformatics and Biostatistics Core Lab, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan; Graduate Institute of Physiology, College of Medicine, National Taiwan University, Taipei, Taiwan. Electronic address: llai@ntu.edu.tw.
  • Tsai MH; Institute of Biotechnology, National Taiwan University, Taipei, Taiwan; Bioinformatics and Biostatistics Core Lab, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan. Electronic address: motiont@gmail.com.
  • Chuang EY; Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan; Bioinformatics and Biostatistics Core Lab, Center of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan; College of Biomedical Engineering, China Medical University, T
Comput Biol Med ; 145: 105416, 2022 06.
Article en En | MEDLINE | ID: mdl-35313206
ABSTRACT

BACKGROUND:

Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.

OBJECTIVE:

The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.

METHODS:

Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.

RESULTS:

The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.

CONCLUSION:

The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Bacterias / Secuenciación de Nucleótidos de Alto Rendimiento Idioma: En Revista: Comput Biol Med Año: 2022 Tipo del documento: Article País de afiliación: Taiwán

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Bacterias / Secuenciación de Nucleótidos de Alto Rendimiento Idioma: En Revista: Comput Biol Med Año: 2022 Tipo del documento: Article País de afiliación: Taiwán