Published September 1, 2021 | Version v1
Dataset Open

Molecule OCR Real images Dataset

Description

Test dataset from paper Image2SMILES: Transformer-based Molecular Optical Recognition Engine. The dataset contains 296 structures: images and Functional Groups SMILES (FG-SMILES). The structures were extracted from 24 papers, which were selected from each volume of Journal of Organic Chemistry (2020). 

Files

Molecule_OCR_real_images.zip

Files (4.9 MB)

Name Size Download all
md5:450cd4fab2e3b3fd6d897ad99985dd4a
4.9 MB Preview Download