There is a newer version of the record available.

Published September 22, 2023 | Version 0.3.0
Dataset Open

Public Data files for MassFormer

  • 1. University of Toronto

Description

Public data files for experiments in MassFormer. See the Github repository for instructions on how to use this data.

Raw Data:

casmi_2016.tgz - Critical Assessment of Small Molecule Identification 2016, used for model evaluation.

casmi_2022.tgz - Critical Assessment of Small Molecule Identification 2022, used for model evaluation.

mb_na_msms.msp.gz - MassBank of North America export of LC-MS/MS spectra, used for model evaluation.

cid_smiles.tsv.gz - Mapping of CID to SMILES strings, obtained from PubChem.

Processed Data:

proc_casmi_2016.tgz - Processed spectrum and molecule data for the CASMI 2016 benchmark.

proc_casmi_2022.tgz - Processed spectrum and molecule data for the CASMI 2022 benchmark.

proc_nist20_outlier.tgz - Processed spectrum and molecule data for the NIST20 Outlier benchmark (formerly called pseudo-CASMI).

proc_demo.tgz - Processed spectrum and molecule data for the demo (refer to code repository for more information).

cfm.tgz - Predicted spectra for the Competitive Fragmentation Modelling (CFM) baseline.

Model Checkpoints:

demo.pkl - Checkpoint of a MassFormer model trained on MoNA data, for the purposes of running the demo.

checkpoint_best_pcqm4mv2.pt - Checkpoint of a Graphormer model pretrained on the PCQM4M dataset, used for initialization of some MassFormer models. Copied from this url. Please refer to the Graphormer repository for more information.

Files

Files (1.0 GB)

Name Size Download all
md5:9d7a2bd77bd02d3ed45cea66646aee81
193.2 MB Download
md5:d6447d47f3b74c84c1c190151ccb13a0
200.9 MB Download
md5:f7dd1827a013bf05c3e7f7b07b946083
277.4 MB Download
md5:cfa1ee20d88fb91d3d664f2694db0ed2
348.3 MB Download