A multi-species benchmark for training and validating large scale mass spectrometry proteomics machine learning models
Authors/Creators
Description
This is a de novo sequencing benchmark dataset derived from nine
publicly available mass spectrometry datasets. There are two versions
of the benchmark: main and balanced. The balanced version randomly
eliminates some spectra associated with some species in order to
create a smaller, more evenly balanced dataset. Also provided are two
zip files containing the raw data as well as intermediate results.
Details about how the benchmark was created are provided in an
associated zenodo release, which contains the source code as well as a
manuscript describing the benchmark.
This release fixes a bug that incorrectly detected shared peptides
between different species. It also includes the annotated spectra in
mzSpecLib format.
Files
nine-species-balanced.zip
Additional details
Software
- Repository URL
- https://github.com/Noble-Lab/multi-species-benchmark
- Programming language
- Python