Published July 7, 2021
| Version 1.0.0
Dataset
Open
InChI to IUPAC name machine learning model
Description
This is a machine learning model that predicts IUPAC names from InChI. It was trained on a dump of PubChem's database, and has a transformer encoder-decoder architecture.
Instructions
Requires:
- Python >= 3.6
- PyTorch == 1.6.0
1. Install OpenNMT-py version 2.0.0:
pip install OpenNMT-py==2.0.0
2. Prepare InChI to be translated by splitting into individual characters separated by whitespace and saving in a text file. You can predict multiple IUPAC names by having one InChI per line (see example.inchi for reference).
3. Perform the prediction with the supplied model file:
onmt_translate --beam_size 10 --length_penalty wu --alpha 1.0 --model inchi2iupac_step_259200.pt --src <infile> --max_length 300 --output <outfile>
Files
Files
(551.8 MB)
Name | Size | Download all |
---|---|---|
md5:5d24a7e184dfb7d05f23635f40c148e1
|
3.1 kB | Download |
md5:9cbf602974b1f462913f5d8f2dfa3e1d
|
551.8 MB | Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.26434/chemrxiv.14170472.v1 (DOI)