Conference paper Open Access

Open-source OCR engine integration with Greek dictionary

Tsimpiris, Alkiviadis; Varsamis, Dimitris; Strouthopoulos, Charalampos; Pavlidis, George

The aim of this study is the evaluation of an open-source OCR engine (tesseract OCR ver.4.0) by integration of a Greek dictionary with more than 500,000 words. To achieve this goal, an open access dictionary was initially used which was enriched with words that exist in the Greek restaurant menus. The training applied in the embedded LSTM deep learning model of Tesseract, before the integration of the new Greek dictionary. The evaluation of OCR performance applied with combinations of dictionaries in a total of 98 images from Greek catering menus. A slight but stable improvement of OCR performance after training and integration of the new Greek dictionary is observed at the results.

Files (482.5 kB)
Name Size
2021.Open-source OCR engine integration with Greek dictionary.pdf
md5:fdbcc2ed373fbb1e1415cc3f72164067
482.5 kB Download
40
43
views
downloads
All versions This version
Views 4040
Downloads 4343
Data volume 20.7 MB20.7 MB
Unique views 3535
Unique downloads 4242

Share

Cite as