There is a newer version of the record available.

Published April 13, 2022 | Version 1.0
Dataset Open

DECIMER - Hand-drawn molecule images dataset

  • 1. Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessingstr. 8, 07743 Jena, Germany
  • 2. Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, D-45665 Recklinghausen, Germany

Description

DECIMER - Hand-drawn molecule images dataset

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available.

Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

Files

Files (103.7 MB)

Name Size Download all
md5:04b674ac7b69b077f825fa9d0a840302
27.2 MB Download
md5:25def3b562aacdae0b414a5b5c7038f2
75.5 MB Download
md5:5b386614ebe1892b5400019317a5d627
785.9 kB Download
md5:5500ec0fdc77ba4742eaa132d2abbab9
239.2 kB Download
md5:b30f4e92f004d050496e02fbc1cebb68
2.1 kB Download