Published March 10, 2025
| Version v1
Dataset
Open
MassIVE-KB v1 30 million PSMs training/validation/test splits
Description
The MassIVE-KB data are derived from PSMs used to compile the MassIVE-KB v1 spectral library and consists of approximately 30 million PSMs. The PSMs were obtained by collecting up to the top 100 PSMs for each of the 2,154,269 precursors (as defined by a peptidoform and charge) included in the MassIVE-KB v1 spectral library.
The data are split into peptide-disjoint training, validation, and test sets, consisting of:
- Training: 28,508,636 PSMs for 1,496,701 unique peptidoforms.
- Validation: 1,000,234 PSMs for 52,379 unique peptidoforms.
- Test: 996,027 PSMs for 52,399 unique peptidoforms.
The dataset was originally compiled through the following steps:
- On the MassIVE website, go to MassIVE Knowledge Base > Human HCD Spectral Library > All Candidate library spectra > Download.
- This will give you a zipped TSV file with the metadata and peptide identifications for all 30 million PSMs.
- Using the filename (column "filename") you can then retrieve the corresponding peak files from the MassIVE FTP server (done using a wget script) and extract the desired spectra using their scan number (column "scan").
Files
Files
(50.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1be06b8430ff8c4d6346302878d20eb0
|
1.6 GB | Download |
|
md5:2aa9984f02260330e31a61364b553f39
|
46.7 GB | Download |
|
md5:8e53b28071d39365e70460353e81a339
|
1.6 GB | Download |