Published April 19, 2024 | Version v2
Dataset Open

Predicting glycan structure from tandem mass spectrometry via deep learning

  • 1. University of Gothenburg
  • 2. Maynooth University
  • 3. ROR icon University of Southampton

Description

Curated set of LC-MS/MS data from glycomics studies. Used for training and applying CandyCrunch, a deep learning model to predict glycan structure from LC-MS/MS data, described in Urban et al., Nat Methods, 2024 and https://github.com/BojarLab/CandyCrunch.

Files:

full_dataset.xlsx: Full dataset with all annotated LC-MS/MS glycan spectra

X_train.pkl: spectra and metadata from our training set

y_train.pkl: labels from our training set

X_test.pkl: spectra and metadata from our independent test set

y_test.pkl: labels from our independent test set

glycans.pkl: glycans in IUPAC-condensed nomenclature in the same order as the label-encoding

Files

Files (17.8 GB)

Name Size Download all
md5:126ae8618d0a7dd2a3c5dc192d07c39b
1.6 GB Download
md5:afa2d498a6d754f2f1e6d45271121817
312.4 kB Download
md5:5a35d9ff6146bc97219f275674aabfd0
2.2 GB Download
md5:57d0f637baf470a7552f063ffe9da957
14.0 GB Download
md5:5fcb2a216fe43305748865923aa3410d
382.4 kB Download
md5:29ad6cbe3b8c1665ba0e85504caebccd
2.0 MB Download

Additional details

Dates

Available
2023-10-31

Software

Repository URL
https://github.com/BojarLab/CandyCrunch
Programming language
Python
Development Status
Active