Published January 13, 2020 | Version 1.0.0
Poster Open

The HUPO-PSI standardized spectral library format

  • 1. VIB-UGent Center for Medical Biotechnology
  • 2. University of California San Diego
  • 3. University of Birmingham
  • 4. Toyama University of International Studies
  • 5. Hong Kong University of Science and Technology
  • 6. National Institute of Standards and Technology
  • 7. European Bioinformatics Institute
  • 8. University of Washington
  • 9. Thermo Fisher Scientific Inc
  • 10. Beijing Institute of Life Omics
  • 11. Institute for Systems Biology

Description

More and more proteomics datasets are becoming available in public repositories. The knowledge embedded in these datasets can be used to improve peptide identification workflows. Spectral library searching provides a straightforward method to boost identification rates using previously identified spectra. Alternatively, machine learning methods can learn from these spectra to accurately predict the behavior of peptides in a liquid chromatography-mass spectrometry system.

At the basis of both approaches are spectral libraries: Unified collections of previously identified spectra. Organizations and projects such as the National Institute of Standards and Technology (NIST), the Global Proteome Machine, PeptideAtlas, PRIDE Archive and MassIVE have all compiled spectral libraries for a multitude of species and experimental setups. A large obstacle, however, is that each organization provides libraries in a different file format. At the software level the problem propagates (if not expands), as different software tools require different file formats.

The solution is a standardized spectral library format that is sufficiently flexible to meet all users' demands, but that is also standardized enough to be usable across environments and software packages. This balance is achieved by setting up a standardized framework and a controlled vocabulary with metadata terms, and allow the format to be represented in different forms, such as plain text, JSON and HDF.

So far, the required (and optional) meta data has been compiled and added to the PSI-MS ontology, and versions of the text and JSON representations have been drafted. The tabular and HDF representations of the format are in development, as well as converters and validators in various programming languages.

Files

2020-01 EuBIC Dev Meeting - SpecLibFormat.pdf

Files (168.6 kB)

Name Size Download all
md5:216e71dcee8bf44d25d3e058fc703439
168.6 kB Preview Download

Additional details

References

  • Deutsch EW et al. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res. 2018;17(12):4051–4060. doi:10.1021/acs.jproteome.8b00485