Published July 7, 2021 | Version 1.0.0
Dataset Open

InChI to IUPAC name machine learning model

  • 1. Science and Technology Facilities Council

Description

This is a machine learning model that predicts IUPAC names from InChI. It was trained on a dump of PubChem's database, and has a transformer encoder-decoder architecture.

Instructions

Requires:

  • Python >= 3.6
  • PyTorch == 1.6.0

1. Install OpenNMT-py version 2.0.0:

pip install OpenNMT-py==2.0.0

2. Prepare InChI to be translated by splitting into individual characters separated by whitespace and saving in a text file. You can predict multiple IUPAC names by having one InChI per line (see example.inchi for reference).

3. Perform the prediction with the supplied model file:

onmt_translate --beam_size 10 --length_penalty wu --alpha 1.0 --model inchi2iupac_step_259200.pt --src <infile> --max_length 300 --output <outfile>

 

Files

Files (551.8 MB)

Name Size Download all
md5:5d24a7e184dfb7d05f23635f40c148e1
3.1 kB Download
md5:9cbf602974b1f462913f5d8f2dfa3e1d
551.8 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.26434/chemrxiv.14170472.v1 (DOI)