Poster Open Access

Leveraging Public Untargeted Metabolomics Data to Propagate Annotations to Millions of MS/MS Spectra

Bittremieux, Wout; Avalon, Nicole; Thomas, Sydney P.; Wang, Mingxun; Dorrestein, Pieter C.

One of the key goals of untargeted tandem mass spectrometry (MS/MS) metabolomics is discovering biologically relevant molecules. Currently there is still a lot of discovery potential: using spectral library searching, on average only ~5% of the data can be annotated. This means that the vast majority of data that are collected do not yield any biological insights.

I will present a strategy to identify molecules that are structurally related to previously known reference molecules using repository-scale molecular networking. Based on spectral similarity, information can be propagated to neighboring MS/MS spectra in a molecular network to increase the spectrum annotation rate. We have propagated annotations from molecular networks associated with 1.2 billion MS/MS spectra from 1,335 public untargeted metabolomics datasets in various metabolomics data repositories, including GNPS/MassIVE, Metabolights, and MetabolomicsWorkbench, to create the GNPS nearest neighbor suspect spectral library. It consists of 87,916 novel reference spectra corresponding to modified molecules that are structurally related to known reference molecules.

Repository-scale molecular networking to create the suspect library revealed 1,350 common modification mass differences, which provide chemical insights into the processes that molecules undergo in vivo and during mass spectrometry analysis. Using the suspect library for spectral library searching boosts the spectrum annotation rate by four-fold on average, considerably increasing the interpretation rate of untargeted metabolomics beyond the state of the art. To demonstrate the performance of the suspect library, suspect annotations enabled the discovery of 969 acylcarnitines, including significant acylcarnitine signatures for Alzheimer’s disease patients, providing biomedically relevant insights into changes in energy metabolism; as well as for natural products drug discovery.

The nearest neighbor suspect spectral library is freely available with an open license on GNPS for community spectral library searching, where it can be used to provide novel hypotheses for previously unexplored untargeted metabolomics data.

Files (2.7 MB)
Name Size
US HUPO 2022 - Suspect spectral library.pdf
2.7 MB Download
All versions This version
Views 149149
Downloads 8989
Data volume 237.0 MB237.0 MB
Unique views 137137
Unique downloads 8181


Cite as