Published September 26, 2024 | Version v1
Publication Open

DreaMS (Deep Representations Empowering the Annotation of Mass Spectra)

  • 1. IOCB Prague

Description

Tandem mass spectrometry (MS/MS) is the primary method for characterizing biological and environmental samples at a molecular level. Despite this, the interpretation of tandem mass spectra remains a challenge. Existing computational methods for predictions from mass spectra heavily rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our new GeMS (GNPS Experimental Mass Spectra) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the emergence of rich representations of molecular structures, which we name DreaMS (Deep Representations Empowering the Annotation of Mass Spectra). Fine-tuning the pre-trained neural network to predict spectral similarity, molecular fingerprints, chemical properties, and the presence of fluorine from tandem mass spectra yields state-of-the-art performance across all the tasks. This underscores the practical utility of DreaMS across diverse mass spectrum interpretation tasks and establishes it as a stepping stone for future advances in the field. We make our new dataset and pre-trained models available to the community and release the DreaMS Atlas - a molecular network of 201 million MS/MS spectra constructed using DreaMS annotations.

Files

DreaMS.zip

Files (2.5 GB)

Name Size Download all
md5:0bda01017bf6cf967cd4525a6d2bc8df
2.5 GB Preview Download

Additional details

Related works

Is described by
Publication: 10.26434/chemrxiv-2023-kss3r-v2 (DOI)

Dates

Updated
2024-09-26

Software

Repository URL
https://github.com/pluskal-lab/DreaMS
Programming language
Python

References

  • DreaMS