There is a newer version of the record available.

Published April 18, 2021 | Version v1
Dataset Open

Matchms and PubChem cleaned MS/MS dataset from GNPS

  • 1. Netherlands eScience Center

Description

Dataset of MS/MS spectra retrieved from GNPS (https://gnps.ucsd.edu) on 25/01/2021, which underwent extensive metadata cleaning.

Metadata was cleaned and processed using matchms (https://github.com/matchms/matchms) and matchmsextras (https://github.com/matchms/matchmsextras). This largely consited of

  • Empty spectra were removed.
  • Compound names were cleaned
  • charge, adduct, formula, ionmode fields were cleaned and corrected
  • parent mass estimated were added (using precursor mz and adduct information)
  • inchikey, inchi, and SMILES were checked and corrected
  • Spectra which remained without inchi/inchikey/smiles were searched against pubchem based on their mass and name.

This resulted in 210,407 spectra out of which 184,698 are annotated with InChIKey and SMILES and/or InChI.

If you use this dataset for your research please cite the following:

  • GNPS, e.g. [Wang, M. et al. Sharing and community curation of mass spectrometry data with GNPS. Nat. Biotechnol. 34, 828–837 (2016)]
  • matchms: [ Huber, F. et al. matchms - processing and similarity evaluation of mass spectrometry data. J. Open Source Softw. 5, 2411 (2020) ]
  • PubChem: [ Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019)]

Many thanks!

Files

Files (1.3 GB)

Name Size Download all
md5:eaf5ca1b3a9f8b6d1dfe14e329ea7947
1.3 GB Download