Published October 2, 2023 | Version 0.3.2
Dataset Open

Public Data files for MassFormer

  • 1. University of Toronto

Description

Public data files for experiments in MassFormer. See the Github repository for instructions on how to use this data.

Raw Data:

casmi_2016.tgz - Critical Assessment of Small Molecule Identification 2016, used for model evaluation.

casmi_2022.tgz - Critical Assessment of Small Molecule Identification 2022, used for model evaluation.

mb_na_msms.msp.gz - MassBank of North America export of LC-MS/MS spectra, used for model evaluation.

cid_smiles.tsv.gz - Mapping of CID to SMILES strings, obtained from PubChem.

Processed Data:

proc_casmi_2016.tgz - Processed spectrum and molecule data for the CASMI 2016 benchmark.

proc_casmi_2022.tgz - Processed spectrum and molecule data for the CASMI 2022 benchmark.

proc_nist20_outlier.tgz - Processed spectrum and molecule data for the NIST20 Outlier benchmark (formerly called pseudo-CASMI).

proc_demo.tgz - Processed spectrum and molecule data for the demo (refer to code repository for more information).

cfm.tgz - Predicted spectra for the Competitive Fragmentation Modelling (CFM) baseline.

Model Checkpoints:

demo.pkl - Checkpoint of a MassFormer model trained on MoNA data, for the purposes of running the demo.

checkpoint_best_pcqm4mv2.pt - Checkpoint of a Graphormer model pretrained on the PCQM4M dataset, used for initialization of some MassFormer models. Copied from this url. Please refer to the Graphormer repository for more information.

Files

Files (4.5 GB)

Name Size Download all
md5:c31160256365c58735e24a6cf0cb7eb9
36.5 MB Download
md5:a49055f71d4a6983c20f665c41c59eb6
23.9 MB Download
md5:9091cbc446124dc947cb2a7b01fe9c98
1.2 GB Download
md5:9d7a2bd77bd02d3ed45cea66646aee81
193.2 MB Download
md5:6e17ad47e5dc9a18404274beeae06484
1.4 GB Download
md5:4a00ad8d88ced0a0cda1d5c9288d82d2
772.8 MB Download
md5:f7dd1827a013bf05c3e7f7b07b946083
277.4 MB Download
md5:c24bae2a10e929d8a1570ae810d45278
24.4 MB Download
md5:82364fafaff0ead065c0b7680da34215
199.7 MB Download
md5:cfa1ee20d88fb91d3d664f2694db0ed2
348.3 MB Download
md5:dab95dd818a5bab14a22bee5551c861e
98.0 MB Download