Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published November 28, 2021 | Version preprint
Conference paper Open

Open-source OCR engine integration with Greek dictionary

  • 1. Department of Computer, Informatics and Telecommunications Engineering, International Hellenic University
  • 2. Athena Research Centre

Description

The aim of this study is the evaluation of an open-source OCR engine (tesseract OCR ver.4.0) by integration of a Greek dictionary with more than 500,000 words. To achieve this goal, an open access dictionary was initially used which was enriched with words that exist in the Greek restaurant menus. The training applied in the embedded LSTM deep learning model of Tesseract, before the integration of the new Greek dictionary. The evaluation of OCR performance applied with combinations of dictionaries in a total of 98 images from Greek catering menus. A slight but stable improvement of OCR performance after training and integration of the new Greek dictionary is observed at the results.

Files

2021.Open-source OCR engine integration with Greek dictionary.pdf

Files (482.5 kB)