Published March 10, 2025 | Version v1
Dataset Open

MassIVE-KB v1 30 million PSMs training/validation/test splits

Authors/Creators

  • 1. University of Antwerp

Description

The MassIVE-KB data are derived from PSMs used to compile the MassIVE-KB v1 spectral library and consists of approximately 30 million PSMs. The PSMs were obtained by collecting up to the top 100 PSMs for each of the 2,154,269 precursors (as defined by a peptidoform and charge) included in the MassIVE-KB v1 spectral library.

The data are split into peptide-disjoint training, validation, and test sets, consisting of:

  • Training: 28,508,636 PSMs for 1,496,701 unique peptidoforms.
  • Validation: 1,000,234 PSMs for 52,379 unique peptidoforms.
  • Test: 996,027 PSMs for 52,399 unique peptidoforms.

The dataset was originally compiled through the following steps:

Files

Files (50.0 GB)

Name Size Download all
md5:1be06b8430ff8c4d6346302878d20eb0
1.6 GB Download
md5:2aa9984f02260330e31a61364b553f39
46.7 GB Download
md5:8e53b28071d39365e70460353e81a339
1.6 GB Download