Dataset Open Access

Cleaned and pre-processed MS/MS datset (build from all positive ionmode spectra in GNPS) - zip file

Huber, Florian; Ridder, Lars; Verhoeven, Stefan; Spaaks, Jurriaan H.; Diblen, Faruk; Rogers, Simon; van der Hooft, Justin J.J.

Large MS/MS dataset build from data that was obtained from GNPS (accessed on 2020-05-11):

The data was cleaned and pre-processed using notebooks provided here:

  • 112,956 positive ionmode spectra
  • metadata was cleaned and corrected using matchms ( and lookup routines using PubChem
  • 92,954 of the spectra have Smiles and InchiKey (13717 unique InchiKey in first 14 characters)


Was used for the main article on Spec2Vec -->

Files (336.6 MB)
Name Size
336.6 MB Download
All versions This version
Views 148148
Downloads 2323
Data volume 7.7 GB7.7 GB
Unique views 125125
Unique downloads 1919


Cite as