Your browser doesn't support javascript.
loading
PROTAX-GPU: a scalable probabilistic taxonomic classification system for DNA barcodes.
Li, Roy; Ratnasingham, Sujeevan; Zarubiieva, Iuliia; Somervuo, Panu; Taylor, Graham W.
Afiliación
  • Li R; Vector Institute for Artificial Intelligence, Toronto, Canada M5G 0C6.
  • Ratnasingham S; Department of Computer Science, University of Toronto, Toronto, Canada M5S 2E4.
  • Zarubiieva I; Centre for Biodiversity Genomics, Guelph, Canada N1G 2W1.
  • Somervuo P; Vector Institute for Artificial Intelligence, Toronto, Canada M5G 0C6.
  • Taylor GW; School of Engineering, University of Guelph, Guelph, Canada N1G 2W1.
Philos Trans R Soc Lond B Biol Sci ; 379(1904): 20230124, 2024 Jun 24.
Article en En | MEDLINE | ID: mdl-38705180
ABSTRACT
DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Código de Barras del ADN Taxonómico Límite: Animals Idioma: En Revista: Philos Trans R Soc Lond B Biol Sci Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Código de Barras del ADN Taxonómico Límite: Animals Idioma: En Revista: Philos Trans R Soc Lond B Biol Sci Año: 2024 Tipo del documento: Article