Published January 19, 2022 | Version v1
Dataset Open

Dataset: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data"

Creators

  • 1. Aalto University

Description

Dataset used in the experiments of the publication: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data" by Bach et al.

File description:

  • cfmid4.tar: MS² spectra simulated using CFM-ID (v4.0.7) for all molecular candidate structures

  • db_layout.png: Visualization of the SQLite database (DB) layout

  • massbank.sqlite.gz: DB containing all needed data to (re-)run the experiments shown in the paper. Please read "DB_README.md" for further details. The database file can be unpacked using gzip.

  • metfrag.tar: MetFrag input files and MS² scores for all candidate sets computed using the MetFrag software.

  • sirius_scores.tar: MS² scores for all candidates and measured spectra using the SIRIUS software.

  • sirius_inputs.tar: Input (ms-files) for the SIRIUS software.

  • DB_README.md: Description of each table in the "massbank.sqlite" SQLite DB.

  • db_processing_scripts.tar: Scripts to re-produce the "massbank.sqlite" and a README.md providing further information on the process.

  • massbank__2020.11__v0.6.1.sqlite: Base SQLite DB from which the "massbank.sqlite" was build up. It was created using the "massbank2db" (v0.6.1) Python package using the MassBank release 2020.11.

  • substructure_fingerprints.tar: Pre-computed substructure counting fingerprints for all candidates related to our experiments.

Instructions:

The "massbank.sqlite" can be directly used with the Structure Support Vector Machine Model (SSVM) described in the manuscript and implemented in the "ssvm" Python package.

If desired, the database can be re-produced using the scripts provided in "db_processing_scripts.tar":

  1. Create a directory for all data
  2. Download and extract the ...
    1. Processing scripts
    2. MS² scorer outputs (e.g. metfrag.tar)
    3. Pre-computed substructure fingerprints
  3. Follow the instructions given in the "README.md" of the "db_processing_scripts.tar"

Files

db_layout.png

Files (17.6 GB)

Name Size Download all
md5:38ae4d78e66f28028c49e2cc84e3aedc
4.3 GB Download
md5:eb81d515ae3c24dcf7014b0a51d0d198
963.4 kB Preview Download
md5:4aeda28612ff361cecfb09d8eb89b7a1
133.1 kB Download
md5:332f13f734eb6d62c941d8f14febaa62
5.7 kB Preview Download
md5:f2c8bcc44afade950bf891b992352406
7.6 GB Download
md5:401224a633324b42636062ca213e0b3e
89.1 MB Download
md5:804d78adb569b45e3e385f58d20db7c6
1.6 GB Download
md5:3d9c9d96265d93cd3dfb33d5b7403457
42.1 MB Download
md5:18451a8a0c1c51a67d1f15abee13b041
3.8 GB Download
md5:b4eb47d247ad2daed816cad9fa36ba92
75.4 MB Download

Additional details

Funding

Machine Learning for Computational Metabolomics 310107
Academy of Finland
Machine learning for digItal diagnostics of antimicrobial resistance 334790
Academy of Finland