Your browser doesn't support javascript.
loading
streammd: fast low-memory duplicate marking using a Bloom filter.
Leonard, Conrad.
Affiliation
  • Leonard C; Department of Genome Informatics, QIMR Berghofer Medical Research Institute, Herston, QLD 4006, Australia.
Bioinformatics ; 39(4)2023 04 03.
Article in En | MEDLINE | ID: mdl-37027230
SUMMARY: Identification of duplicate templates is a common preprocessing step in bulk sequence analysis; for large libraries, this can be resource intensive. Here, we present streammd: a fast, memory-efficient, single-pass duplicate marker operating on the principle of a Bloom filter. streammd closely reproduces outputs from Picard MarkDuplicates while being substantially faster, and requires much less memory than SAMBLASTER. AVAILABILITY AND IMPLEMENTATION: streammd is a C++ program available from GitHub https://github.com/delocalizer/streammd under the MIT license.
Subject(s)

Full text: 1 Database: MEDLINE Main subject: Software Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Type: Article Affiliation country: Australia

Full text: 1 Database: MEDLINE Main subject: Software Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Type: Article Affiliation country: Australia