Published April 13, 2022 | Version 1.2
Dataset Open

DECIMER - Hand-drawn molecule images dataset

  • 1. Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessingstr. 8, 07743 Jena, Germany
  • 2. Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, D-45665 Recklinghausen, Germany

Description

DECIMER - Hand-drawn molecule images dataset

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available.

Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

Files

DECIMER_HDM_Dataset_Images.zip

Files (128.9 MB)

Name Size Download all
md5:870c1381b91bc0504ac6c020f9da7300
47.2 MB Download
md5:7c67bb16bb9819b5bf2699e216ace087
79.1 MB Preview Download
md5:fbeb9cc9d56ec125cb95307a16dd913b
2.3 MB Preview Download
md5:3b643bc381be7fe005a7541e9884ca8d
234.2 kB Download
md5:379f3e863059675232b516f604875de2
843 Bytes Preview Download
md5:b30f4e92f004d050496e02fbc1cebb68
2.1 kB Download