Your browser doesn't support javascript.
loading
mspack: efficient lossless and lossy mass spectrometry data compression.
Hanau, Felix; Röst, Hannes; Ochoa, Idoia.
Afiliación
  • Hanau F; Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
  • Röst H; Department of Molecular Genetics, Donnelly Center, University of Toronto, Toronto, ON M5S 3E1, Canada.
  • Ochoa I; Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Bioinformatics ; 37(21): 3923-3925, 2021 11 05.
Article en En | MEDLINE | ID: mdl-34478503
ABSTRACT
MOTIVATION Mass spectrometry (MS) data, used for proteomics and metabolomics analyses, have seen considerable growth in the last years. Aiming at reducing the associated storage costs, dedicated compression algorithms for MS data have been proposed, such as MassComp and MSNumpress. However, these algorithms focus on either lossless or lossy compression, respectively, and do not exploit the additional redundancy existing across scans contained in a single file. We introduce mspack, a compression algorithm for MS data that exploits this additional redundancy and that supports both lossless and lossy compression, as well as the mzML and the legacy mzXML formats. mspack applies several preprocessing lossless transforms and optional lossy transforms with a configurable error, followed by the general purpose compressors gzip or bsc to achieve a higher compression ratio.

RESULTS:

We tested mspack on several datasets generated by commonly used MS instruments. When used with the bsc compression backend, mspack achieves on average 76% smaller file sizes for lossless compression and 94% smaller file sizes for lossy compression, as compared with the original files. Lossless mspack achieves 10-60% lower file sizes than MassComp, and lossy mspack compresses 36-60% better than the lossy MSNumpress, for the same error, while exhibiting comparable accuracy and running time. AVAILABILITY AND IMPLEMENTATION mspack is implemented in C++ and freely available at https//github.com/fhanau/mspack under the Apache license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Compresión de Datos Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Compresión de Datos Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos