How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data.
J Proteome Res
; 23(5): 1702-1712, 2024 May 03.
Article
in En
| MEDLINE
| ID: mdl-38640356
ABSTRACT
Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Mass Spectrometry
/
Data Compression
/
Metabolomics
Limits:
Humans
Language:
En
Journal:
J Proteome Res
Journal subject:
BIOQUIMICA
Year:
2024
Document type:
Article
Affiliation country: