Published October 27, 2020 | Version v1
Software Open

Towards the Optical Character Recognition of DSLs - Artifact

  • 1. University of Extremadura
  • 2. Open University of Catalonia

Description

img2DSL is an image recognition toolkit designed to study how Optical Character Recognition can be applied to images that contain DSL snippets. Using the Object Constraint Language (OCL) as an example of textual DSL and given a dataset of Ecore models (and its OCL expressions), this toolkit encodes the OCL expressions into images and tests how different strategies improve the default OCR quality. In this project we use Tesseract as OCR engine and the different strategies are different OCR models and custom algorithms.

In order to evaluate the toolkit and the quality of its different strategies, we load the recognized expressions in the USE tool to measure of how many expressions are valid after the recognition

Notes

This work was partially funded by the Spanish Research Project TIN2016-75944-R.

Files

img2DSL-SLE20-artifact.zip

Files (10.9 MB)

Name Size Download all
md5:825dea1b9d29dc252685940adc9b2188
10.9 MB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1145/3426425.3426937 (DOI)