Published March 11, 2022 | Version v1
Presentation Open

Leveraging public untargeted metabolomics data to propagate structurally related molecule annotations to millions of MS/MS spectra

  • 1. University of California San Diego


One of the key goals of untargeted tandem mass spectrometry (MS/MS) metabolomics is discovering biologically relevant molecules. Currently there is still a lot of discovery potential: using spectral library searching, on average only ~5% of the data can be annotated. This means that the vast majority of data that are collected do not yield any biological insights.

I will present a strategy to identify molecules that are structurally related to previously known reference molecules using repository-scale molecular networking. Based on spectral similarity, information can be propagated to neighboring MS/MS spectra in a molecular network to increase the spectrum annotation rate. We have propagated annotations from molecular networks associated with 521 million MS/MS spectra from 1335 compatible untargeted metabolomics datasets in various metabolomics data repositories, including GNPS/MassIVE, Metabolights, and Metabolomics Workbench, to create the GNPS nearest neighbor suspect spectral library. It consists of 87,916 novel reference spectra corresponding to modified molecules that are structurally related to known reference molecules.

Repository-scale molecular networking to create the suspect library revealed 1350 common modification mass differences, which provide chemical insights into the processes that molecules undergo in vivo and during mass spectrometry analysis. Using the suspect library for spectral library searching increases the spectrum annotation rate 2-fold on average, considerably boosting the interpretation rate of untargeted metabolomics beyond the state of the art. To demonstrate the performance of the suspect library, suspect annotations enabled the discovery of hundreds of acylcarnitines, including significant acylcarnitine signatures for Alzheimer's disease patients, providing biomedically relevant insights into changes in energy metabolism; as well as for natural products drug discovery.

The nearest neighbor suspect spectral library is freely available with an open license on GNPS for community spectral library searching, where it can be used to provide novel hypotheses for previously unexplored untargeted metabolomics data.


Files (4.7 MB)

Name Size Download all
4.7 MB Download