Published September 4, 2024 | Version v2
Dataset Open

A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models

Description

This is a de novo sequencing benchmark dataset derived from nine
publicly available mass spectrometry datasets. There are two versions
of the benchmark: main and balanced. The balanced version randomly
eliminates some spectra associated with some species in order to
create a smaller, more evenly balanced dataset. Also provided are two
zip files containing the raw data as well as intermediate results.
Details about how the benchmark was created are provided in an
associated zenodo release, which contains the source code as well as a
manuscript describing the benchmark.

This release fixes a bug that incorrectly detected shared peptides
between different species. It also includes the annotated spectra in
mzSpecLib format.

 

Files

nine-species-balanced.zip

Files (99.0 GB)

Name Size Download all
md5:cc39b5c25c317f759a706b6d724d099a
5.5 GB Preview Download
md5:efdf72252b59326538ad95a0557a3fb9
59.5 GB Preview Download
md5:21e9bfd2f9b82b3ed6aa08a2238bcc33
8.3 GB Preview Download
md5:624d094ba1677556bdec6a893e3b1209
25.7 GB Preview Download

Additional details

Software