Poster Open Access
When searching untargeted metabolomics data using spectral libraries, depending on the sample type, ~1–20% of the data is annotated. In recent years, molecular networking has been used to find modifications of food-derived molecules that occur during food digestion, investigate drug metabolites that arise due to metabolism, discover designer drugs, etc. Based on spectral similarity, information can be propagated to neighboring nodes in a molecular network to increase the spectrum annotation rate. Taking both direct peak matches and neutral loss peak matches into account, we have propagated annotations from the molecular networks associated with 1.2 billion MS/MS spectra from GNPS/MassIVE, Metabolights, and MetabolomicsWB and created the open source Global Natural Products Social Molecular Networking (GNPS) community suspect library.
A data-driven approach was used to create a propagated spectral library, referred to as the "suspect" spectral library, from repository-wide molecular networking results on the GNPS platform. Using "living data" reanalysis from 1335 publicly available datasets, a novel spectral library consisting of molecules that are structurally related to known reference molecules was compiled. Suspects were derived from high-quality spectrum pairs for which only one of the spectra was identified during spectral library searching and both spectra have a non-zero precursor mass difference (minimum cosine similarity 0.8 and minimum six matching ions). In this case, the unidentified spectrum corresponds to a previously unknown molecule that is structurally related to the reference molecule, and was included in the suspect spectral library.
The suspect spectral library is available with an open license on GNPS for community use. In total, it contains 87,916 new reference spectra that are structurally related to previous library spectra. Importantly, all entries in the suspect spectral library are derived from open MS/MS data on GNPS/MassIVE and constitute matches to experimentally observed spectra. In contrast, all other publicly available spectral libraries on GNPS consist of a combined 82,203 reference spectra, of which only 14,777 spectra have been matched against experimental data. As such, the suspect spectral library significantly boosts the number of relevant publicly available reference spectra.
Repository-scale molecular networking to create the suspect spectral library revealed 1,350 common mass differences associated with modifications that molecules undergo. The most common mass differences are a loss or gain of 2.016 Da, 28.031 Da, 14.016 Da, 18.010 Da, and 15.995 Da; which may correspond to two hydrogen atoms, Ala->Val/Cys->Met amino acid substitutions or acetaldehyde, methylation or Asn->Gln/Asp->Glu/Gly->Ala/Ser->Thr/Val->Leu amino acid substitutions, water, and oxidation or Ala->Ser/Phe->Tyr amino acid substitutions, respectively. These data give novel chemical insights into the processes that molecules undergo in vivo and during mass spectrometry analysis.
Identification performance of the suspect spectral library was benchmarked using public MS/MS data on GNPS. For a wide diversity of sample types in publicly available datasets, including microbial samples, environmental samples, and human biofluid samples, incorporating the suspect spectral library resulted in a 2 to 8 fold increase in the number of identified MS/MS spectra compared to analysis using the GNPS community spectral libraries only. This large boost in identification performance demonstrates the excellent relevance of the novel reference spectra included in the suspect spectral library and contributes important new biological insights from the analysis of untargeted metabolomics.
Data-driven spectral library compilation of structurally related molecules boosts the spectral annotation rate by up to 8 fold.
2021-11-02 - ASMS - Repository Scale Propagated Spectral Library of Suspects for Untargeted Metabolomics.pdf