Published May 6, 2025 | Version v2
Preprint Open

mzPeak: designing a scalable, interoperable, and future-ready mass spectrometry data format

  • 1. VIB - UGent Center for Medical Biotechnology
  • 2. ROR icon European Molecular Biology Laboratory
  • 3. University of California San Diego Skaggs School of Pharmacy and Pharmaceutical Sciences
  • 4. ROR icon Pacific Northwest National Laboratory
  • 5. ROR icon University of California, San Diego
  • 6. ROR icon University of Antwerp
  • 7. ROR icon Consejo Nacional de Investigaciones Científicas y Técnicas
  • 8. ROR icon Universidad Nacional de Córdoba
  • 9. ROR icon Fondazione Edmund Mach
  • 10. ROR icon University of Washington
  • 11. ROR icon University of Toronto
  • 12. ROR icon Institute for Systems Biology
  • 13. ROR icon University of Bristol
  • 14. ROR icon The Alan Turing Institute
  • 15. ROR icon VIB-UGent Center for Medical Biotechnology
  • 16. ROR icon RECETOX
  • 17. ROR icon Eli Lilly (United States)
  • 18. VIB UGent Centrum voor Medische Biotechnologie
  • 19. ROR icon Ghent University
  • 20. ROR icon European Bioinformatics Institute
  • 21. Thermo Fisher Scientific Inc
  • 22. ROR icon University Medical Center Groningen
  • 23. ROR icon Ruhr University Bochum
  • 24. ROR icon University of California, Riverside
  • 25. Bruker Daltonik GmbH
  • 26. ROR icon University of Tübingen
  • 27. OpenMS Inc

Description

Abstract

Advances in mass spectrometry (MS) instrumentation, such as higher resolution, faster scan speeds, and improved sensitivity, have significantly increased the volume and complexity of data. The growing adoption of imaging and ion mobility further amplifies these challenges across MS-based omics fields, including proteomics, metabolomics, and lipidomics. While these technologies unlock new possibilities, they also present significant challenges in data management, storage, and accessibility. Existing open formats, such as the XML-based community standards mzML and imzML, struggle to meet the demands of modern MS workflows due to their large file sizes, slow data access, and limited metadata support. Vendor-specific formats, while optimized for proprietary instruments, lack interoperability, comprehensive metadata support and long-term archival reliability.

This white paper lays the groundwork for mzPeak, a next-generation community data format designed to address these challenges and support high-throughput, multi-dimensional MS workflows. By adopting a hybrid model that combines efficient binary storage for numerical data and both human and machine-readable metadata storage, mzPeak will reduce file sizes, accelerate data access, and offer a scalable, adaptable solution for evolving MS technologies.

For researchers, mzPeak will enable enhanced interoperability across platforms, seamless support for complex workflows including ion mobility and MS imaging, and faster data access compared to existing community formats such as mzML. Its design will ensure data is managed in compliance with regulatory standards, essential for applications such as precision medicine and chemical safety, where long-term data integrity and accessibility are critical.

For vendors, mzPeak provides a streamlined, open alternative to proprietary formats, reducing the burden of regulatory compliance while aligning with the industry's push for transparency and standardization. By offering a high-performance, interoperable solution, mzPeak positions vendors to meet customer demands for sustainable data management tools which will be able to handle emerging and future data types and workflows.

mzPeak aspires to become the cornerstone of MS data management, empowering researchers, vendors, and developers to innovate and collaborate more effectively.

Files

Files (124.7 kB)

Name Size Download all
md5:4842885968d3335152ff52f829c4c97c
124.7 kB Download