Your browser doesn't support javascript.
loading
Numerical compression schemes for proteomics mass spectrometry data.
Teleman, Johan; Dowsey, Andrew W; Gonzalez-Galarza, Faviel F; Perkins, Simon; Pratt, Brian; Röst, Hannes L; Malmström, Lars; Malmström, Johan; Jones, Andrew R; Deutsch, Eric W; Levander, Fredrik.
  • Teleman J; From the ‡Department of Immunotechnology, Lund University, Medicon Village building 406, 223 81 Lund Sweden;
  • Dowsey AW; §Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, United Kingdom; ¶Centre for Advanced Discovery and Experimental Therapeutics (CADET), University of Manchester and Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Healt
  • Gonzalez-Galarza FF; ‖Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, United Kingdom;
  • Perkins S; ‖Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, United Kingdom;
  • Pratt B; **Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, 98195, USA;
  • Röst HL; ‡‡Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule Zürich, Wolfgang-Pauli Strasse 16, 8093 Zurich, Switzerland;
  • Malmström L; ‡‡Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule Zürich, Wolfgang-Pauli Strasse 16, 8093 Zurich, Switzerland;
  • Malmström J; §§Department of Clinical Sciences, Faculty of Medicine, Lund University, SE-221 84 Lund, Sweden;
  • Jones AR; ‖Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, United Kingdom;
  • Deutsch EW; ¶¶Institute for Systems Biology, 401 Terry Avenue North, Seattle, Washington 98109, USA; edeutsch@systemsbiology.org.
  • Levander F; From the ‡Department of Immunotechnology, Lund University, Medicon Village building 406, 223 81 Lund Sweden; ‖‖Bioinformatics Infrastructure for Life Sciences, Lund University, Sweden.
Mol Cell Proteomics ; 13(6): 1537-42, 2014 Jun.
Article en En | MEDLINE | ID: mdl-24677029
ABSTRACT
The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naïve mzML representation is fourfold or even up to 18-fold larger compared with the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Espectrometría de Masas / Programas Informáticos / Proteómica Idioma: En Año: 2014 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Espectrometría de Masas / Programas Informáticos / Proteómica Idioma: En Año: 2014 Tipo del documento: Article