There is a newer version of this record available.

Dataset Open Access

Matchms and PubChem cleaned MS/MS dataset from GNPS

Huber, Florian

Dataset of MS/MS spectra retrieved from GNPS (https://gnps.ucsd.edu) on 25/01/2021, which underwent extensive metadata cleaning.

Metadata was cleaned and processed using matchms (https://github.com/matchms/matchms) and matchmsextras (https://github.com/matchms/matchmsextras). This largely consited of

  • Empty spectra were removed.
  • Compound names were cleaned
  • charge, adduct, formula, ionmode fields were cleaned and corrected
  • parent mass estimated were added (using precursor mz and adduct information)
  • inchikey, inchi, and SMILES were checked and corrected
  • Spectra which remained without inchi/inchikey/smiles were searched against pubchem based on their mass and name.

This resulted in 210,407 spectra out of which 184,698 are annotated with InChIKey and SMILES and/or InChI.

If you use this dataset for your research please cite the following:

  • GNPS, e.g. [Wang, M. et al. Sharing and community curation of mass spectrometry data with GNPS. Nat. Biotechnol. 34, 828–837 (2016)]
  • matchms: [ Huber, F. et al. matchms - processing and similarity evaluation of mass spectrometry data. J. Open Source Softw. 5, 2411 (2020) ]
  • PubChem: [ Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019)]

Many thanks!

Files (1.3 GB)
Name Size
ALL_GNPS_210125_matchms_pubchem_cleaned.msp
md5:eaf5ca1b3a9f8b6d1dfe14e329ea7947
1.3 GB Download
169
43
views
downloads
All versions This version
Views 169152
Downloads 4333
Data volume 55.2 GB42.3 GB
Unique views 147140
Unique downloads 3931

Share

Cite as