Dataset: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data"
Description
Dataset used in the experiments of the publication: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data" by Bach et al.
File description:
-
cfmid4.tar: MS² spectra simulated using CFM-ID (v4.0.7) for all molecular candidate structures
-
db_layout.png: Visualization of the SQLite database (DB) layout
-
massbank.sqlite.gz: DB containing all needed data to (re-)run the experiments shown in the paper. Please read "DB_README.md" for further details. The database file can be unpacked using gzip.
-
metfrag.tar: MetFrag input files and MS² scores for all candidate sets computed using the MetFrag software.
-
sirius_scores.tar: MS² scores for all candidates and measured spectra using the SIRIUS software.
-
sirius_inputs.tar: Input (ms-files) for the SIRIUS software.
-
DB_README.md: Description of each table in the "massbank.sqlite" SQLite DB.
-
db_processing_scripts.tar: Scripts to re-produce the "massbank.sqlite" and a README.md providing further information on the process.
-
massbank__2020.11__v0.6.1.sqlite: Base SQLite DB from which the "massbank.sqlite" was build up. It was created using the "massbank2db" (v0.6.1) Python package using the MassBank release 2020.11.
-
substructure_fingerprints.tar: Pre-computed substructure counting fingerprints for all candidates related to our experiments.
Instructions:
The "massbank.sqlite" can be directly used with the Structure Support Vector Machine Model (SSVM) described in the manuscript and implemented in the "ssvm" Python package.
If desired, the database can be re-produced using the scripts provided in "db_processing_scripts.tar":
- Create a directory for all data
- Download and extract the ...
- Processing scripts
- MS² scorer outputs (e.g. metfrag.tar)
- Pre-computed substructure fingerprints
- Follow the instructions given in the "README.md" of the "db_processing_scripts.tar"
Files
db_layout.png
Files
(17.6 GB)
Name | Size | Download all |
---|---|---|
md5:38ae4d78e66f28028c49e2cc84e3aedc
|
4.3 GB | Download |
md5:eb81d515ae3c24dcf7014b0a51d0d198
|
963.4 kB | Preview Download |
md5:4aeda28612ff361cecfb09d8eb89b7a1
|
133.1 kB | Download |
md5:332f13f734eb6d62c941d8f14febaa62
|
5.7 kB | Preview Download |
md5:f2c8bcc44afade950bf891b992352406
|
7.6 GB | Download |
md5:401224a633324b42636062ca213e0b3e
|
89.1 MB | Download |
md5:804d78adb569b45e3e385f58d20db7c6
|
1.6 GB | Download |
md5:3d9c9d96265d93cd3dfb33d5b7403457
|
42.1 MB | Download |
md5:18451a8a0c1c51a67d1f15abee13b041
|
3.8 GB | Download |
md5:b4eb47d247ad2daed816cad9fa36ba92
|
75.4 MB | Download |
Additional details
Funding
- Machine Learning for Computational Metabolomics 310107
- Academy of Finland
- Machine learning for digItal diagnostics of antimicrobial resistance 334790
- Academy of Finland