Published July 20, 2024
| Version V1.0
Dataset
Open
Liquid Chromatography - Tandem Mass Spectrometry (LC-MS/MS) and Gas Chromatography - Mass Spectrometry (GC-MS) Reference Libraries from Global Natural Products Social Molecular Networking (GNPS) and National Institute of Standards and Technology (NIST) WebBook Processed for Spectral Library Matching
Creators
Description
In order to obtain a high-quality LC-MS/MS reference database for spectral library matching, we selected 22 high-quality GNPS tandem mass spectrometry databases generated under the positive ion mode. Further preprocessing similar to Huber et al involving mass-to-charge (m/z) and intensity filtering yields the database found in the file LCMS_GNPS_reference_library.csv which contains 14,705 electrospray ionization (ESI) mass spectra, each of which corresponds to a unique compound. The NIST WebBook database was used to construct GC-MS database contained in the file GCMS_NIST_WebBook.csv. This database contains 23,721 electron ionization (EI) mass spectra, each of which corresponds to a unique non-hyphenated Chemical Abstract Service (CAS) Registry Number.
Both LC-MS/MS and GC-MS databases are organized into three columns: one for the identifier, one for the m/z values, and one for the intensity values. For example, if spectrum A has 20 ion fragments, then there will be 20 rows corresponding to spectrum A in the corresponding database with the identifier A repeated 20 times with the corresponding m/z and intensity values.
Files
GCMS_NIST_WebBook_reference_library.csv
Files
(151.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:30aec2eb30c0ba8bca1c8a9acd9aa0c8
|
34.3 MB | Preview Download |
|
md5:ecea5d5295bc40b7af5c4da4e4a51d12
|
117.0 MB | Preview Download |
Additional details
Related works
- Requires
- Preprint: 10.26434/chemrxiv-2024-5fm7t (DOI)
Funding
- National Institutes of Health
- National Cancer Institute
Dates
- Submitted
-
2024-07-20
References
- Global Natural Products Social Molecular Networking MS/MS Spectral Library Databases URL: https://ccms-ucsd.github.io/GNPSDocumentation/gnpslibraries/
- Huber F, Ridder L, Verhoeven S, Spaaks JH, Diblen F, Rogers S, et al. (2021) Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput Biol 17(2): e1008724. https://doi.org/10.1371/journal.pcbi.1008724
- NIST WebBook URL: https://webbook.nist.gov/
- Kim S, Kato I, Zhang X. Comparative Analysis of Binary Similarity Measures for Compound Identification in MassSpectrometry-Based Metabolomics. Metabolites. 2022 Jul 26;12(8):694. doi: 10.3390/metabo12080694. PMID: 35893261; PMCID: PMC9394311. https://doi.org/10.3390/metabo12080694