RESUMEN
DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
Asunto(s)
Algoritmos , Código de Barras del ADN Taxonómico , Código de Barras del ADN Taxonómico/métodos , Clasificación/métodos , Gráficos por Computador , AnimalesRESUMEN
Nucleic acids are a powerful engineering material that can be used to implement a broad range of computational circuits at the nanoscale, with potential applications in high-precision biosensing, diagnostics, and therapeutics. However, nucleic acid circuits are prone to leaks, which result from unintended displacement interactions between nucleic acid strands. Such leaks can grow combinatorially with circuit size, are challenging to mitigate, and can significantly compromise circuit behavior. While several techniques have been proposed to partially mitigate leaks, computational methods for designing new leak mitigation strategies and comparing their effectiveness on circuit behavior are limited. Here we present a general method for the automated leak analysis of nucleic acid circuits, referred to as DSD Leaks. Our method extends the logic programming functionality of the Visual DSD language, developed for the design and analysis of nucleic acid circuits, with predicates for leak generation, a leak reaction enumeration algorithm, and predicates to exclude low probability leak reactions. We use our method to identify the critical leak reactions affecting the performance of control circuits, and to analyze leak mitigation strategies by automatically generating leak reactions. Finally, we design new control circuits with substantially reduced leakage including a sophisticated proportional-integral controller circuit, which can in turn serve as building blocks for future circuits. By integrating our method within an open-source nucleic acid circuit design tool, we enable the leak analysis of a broad range of circuits, as an important step toward facilitating robust and scalable nucleic acid circuit design.