Open-source OCR engine integration with Greek dictionary
- 1. Department of Computer, Informatics and Telecommunications Engineering, International Hellenic University
- 2. Athena Research Centre
Description
The aim of this study is the evaluation of an open-source OCR engine (tesseract OCR ver.4.0) by integration of a Greek dictionary with more than 500,000 words. To achieve this goal, an open access dictionary was initially used which was enriched with words that exist in the Greek restaurant menus. The training applied in the embedded LSTM deep learning model of Tesseract, before the integration of the new Greek dictionary. The evaluation of OCR performance applied with combinations of dictionaries in a total of 98 images from Greek catering menus. A slight but stable improvement of OCR performance after training and integration of the new Greek dictionary is observed at the results.
Files
2021.Open-source OCR engine integration with Greek dictionary.pdf
Files
(482.5 kB)
Name | Size | Download all |
---|---|---|
md5:fdbcc2ed373fbb1e1415cc3f72164067
|
482.5 kB | Preview Download |