streammd: fast low-memory duplicate marking using a Bloom filter.
Bioinformatics
; 39(4)2023 04 03.
Article
in En
| MEDLINE
| ID: mdl-37027230
SUMMARY: Identification of duplicate templates is a common preprocessing step in bulk sequence analysis; for large libraries, this can be resource intensive. Here, we present streammd: a fast, memory-efficient, single-pass duplicate marker operating on the principle of a Bloom filter. streammd closely reproduces outputs from Picard MarkDuplicates while being substantially faster, and requires much less memory than SAMBLASTER. AVAILABILITY AND IMPLEMENTATION: streammd is a C++ program available from GitHub https://github.com/delocalizer/streammd under the MIT license.
Full text:
1
Database:
MEDLINE
Main subject:
Software
Language:
En
Journal:
Bioinformatics
Journal subject:
INFORMATICA MEDICA
Year:
2023
Type:
Article
Affiliation country:
Australia