Published July 23, 2023 | Version v1
Journal article Open

TransExtION: Transformer based Explainable similarity metric for IONS

  • 1. Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium
  • 2. Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340 Beerse, Belgium.

Description

TransExtION is a supervised learning method estimating spectral similarity between MS/MS spectra that are strongly correlated to their structural similarity. It can be used in spectral library search to find structural analogues. TransExtION is based on Transformer architecture and provides a post hoc explanation for its outcome in order to reveal the relationship between fragments.

Here we provide a pretrained transformer model "GNPS_MassBank.ms.model". The model was trained using (+)ESI GNPS/MassBank spectra of 9,996 unique compounds ("GNPS_MassBank_train.mgf").  The query spectrum/spectra should be written in mgf format (example: "GNPS_MassBank_test.mgf" and "test_urine.mgf"), and it/they can be annotated by searching a spectral library after format conversion (example: "ALL_PUBLIC_LIBRARY_POS_CONSENSUS_2022.mgf" converted to "ALL_PUBLIC_LIBRARY_POS_CONSENSUS_2022.db").  The "GNPS_MassBank.ms.model", along with the converted "ALL_PUBLIC_LIBRARY_POS_CONSENSUS_2022.db" (covering over 15,000 metabolites, natural products, and drugs), can be directly used for positive ion mode library search, compounds annotation and post hoc explanation.

Files

Files (126.2 MB)

Name Size Download all
md5:279fff077afa610c851e8b88014d780d
20.8 MB Download
md5:3234fa3d00f61a39ff44548b2f99b04c
24.8 MB Download
md5:e2ed38093d3bcab472ab2142a4009774
8.4 MB Download
md5:90e7cc149b95a6792abf2a96bdea81ae
1.7 MB Download
md5:3f03c5cb925091203ac015d089ce0bd8
20.8 MB Download
md5:10607ae83c06c84e9dbaac015c79b52c
33.5 MB Download
md5:e77d509dd95677e7cbcb639c80789e3f
16.2 MB Download