Your browser doesn't support javascript.
loading
How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data.
Tong, Junjie; Lu, Miaoshan; Wang, Ruimin; An, Shaowei; Wang, Jinyin; Wang, Tong; Xie, Cong; Yu, Changbin.
Affiliation
  • Tong J; Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China.
  • Lu M; Key Laboratory of Tropical Medicinal Plant Chemistry of Ministry of Education, College of Chemistry and Chemical Engineering, Hainan Normal University, Haikou 571158, Hainan, China.
  • Wang R; Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China.
  • An S; Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China.
  • Wang J; Fudan University, Shanghai 200000, China.
  • Wang T; Westlake University, Hangzhou 310024, Zhejiang, China.
  • Xie C; Fudan University, Shanghai 200000, China.
  • Yu C; Westlake University, Hangzhou 310024, Zhejiang, China.
J Proteome Res ; 23(5): 1702-1712, 2024 May 03.
Article in En | MEDLINE | ID: mdl-38640356
ABSTRACT
Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Mass Spectrometry / Data Compression / Metabolomics Limits: Humans Language: En Journal: J Proteome Res Journal subject: BIOQUIMICA Year: 2024 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Mass Spectrometry / Data Compression / Metabolomics Limits: Humans Language: En Journal: J Proteome Res Journal subject: BIOQUIMICA Year: 2024 Document type: Article Affiliation country: