Published June 9, 2022 | Version v1
Dataset Open

PROSPECT

  • 1. Computational Mass Spectrometry, Technical University of Munich

Description

PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics

PROSPECT (PROteometools SPECTrum compendium) is a large annotated dataset leveraging the raw data from ProteomeTools.

The dataset consists of 12 packages and has two main parquet file formats; meta-data and annotation files. Each package has one meta-data file, while the annotations file is split into multiple files per package to facilitate reading the data.

Annotation files are sub-organized by pools, where a pool is a set of ~1k peptides measured in one analysis to keep the complexity low and the identification rate high. The annotation files are also zipped together into archives.

In all files, a unique identifier to trace back any example to its original raw data file in ProteomeTools is provided. This identifier is the combination of the raw file ID and the scan number.

The original ProteomeTools dataset is available on PRIDE: Part1Part2Part3.

 

Notes

Bundesministerium für Bildung und Forschung – BMBF; Grant Number 031L0008A

Files

README.md

Files (163.2 GB)

Name Size Download all
md5:399b8c17068308f9dce64a151a1ae0da
3.2 kB Preview Download
md5:7c7c10ef95b7d45eb24b4bb80d42f880
11.5 GB Preview Download
md5:82800012ba48611695c7147d009fd5f3
149.8 MB Download
md5:6b3154829f4df588693fb9e1395dab48
11.4 GB Preview Download
md5:2b914d4aad0d99dee4888a3788cc033e
62.7 MB Download
md5:2b8c45bbb25d67ebbe105fa1c57e97c5
8.7 GB Preview Download
md5:459e73d52cc3292da5724a06bb70faa2
9.1 GB Preview Download
md5:db801238cac201c76debb7f258085a87
9.3 GB Preview Download
md5:e06e23fd13d14b1eab0a695a2f370ec3
240.7 MB Download
md5:0b818429f6768196bd7ab859c64c0542
13.9 GB Preview Download
md5:174c22b7a43d9e76741163708a39e0a8
10.1 GB Preview Download
md5:d4113f8f000d8655ae9526c6a7d15eb1
9.8 GB Preview Download
md5:ef32a4f5b87d2fb90bd9d3ed8568d2f1
170.3 MB Download
md5:31bbf4f40f0b0b9ab5dcb23b6e62803b
257.7 MB Download
md5:cb9c56a861d8b7664078c14b165337bc
11.3 GB Preview Download
md5:b8548820ca5d61b914a66cf1aefaec94
11.8 GB Preview Download
md5:b246a3c2a73818ea88b3b61b491df1be
225.2 MB Download
md5:fcb86674338f687e9427bce0c7f606f1
10.9 GB Preview Download
md5:5d8d9da924db633dec7057fd0693d105
53.7 MB Download
md5:1318d05d166ca10959beece68664cca3
1.6 GB Preview Download
md5:200727477fa0377c7fa1367181cfcff3
10.6 MB Download
md5:02c7d5de4baefe9402050d1518065e75
4.4 GB Preview Download
md5:0ef9ccad3a5ecc7e49f126c0410eadc9
57.7 MB Download
md5:e42a4efbb565a933c0ea28e55695554d
8.3 GB Preview Download
md5:01ba8590ba57ae7f1bfb07c3c58c2cf0
92.9 MB Download
md5:c52913f8fa6caece446f65b6084f9fe9
9.4 GB Preview Download
md5:560be62bd1c8b026a648dfc0ee518461
9.3 GB Preview Download
md5:1e704adc82f5f9f2dcf2d8c66879ffa1
9.5 GB Preview Download
md5:f8ae60f49411f78a2833dd7a6e2b7254
146.5 MB Download
md5:6b7a1173c450939e025afedb70b3915f
1.5 GB Preview Download
md5:bb3e7087dcf70741f7c495f407175336
8.2 MB Download

Additional details

Funding

EPIC-XS – European Proteomics Infrastructure Consortium providing Access 823839
European Commission