10.5281/zenodo.5887016
https://zenodo.org/records/5887016
oai:zenodo.org:5887016
Tsimpiris, Alkiviadis
Alkiviadis
Tsimpiris
Department of Computer, Informatics and Telecommunications Engineering, International Hellenic University
Varsamis, Dimitris
Dimitris
Varsamis
Department of Computer, Informatics and Telecommunications Engineering, International Hellenic University
Strouthopoulos, Charalampos
Charalampos
Strouthopoulos
Department of Computer, Informatics and Telecommunications Engineering, International Hellenic University
Pavlidis, George
George
Pavlidis
Athena Research Centre
Open-source OCR engine integration with Greek dictionary
Zenodo
2021
OCR
Tesseract
Greek
dictionary
2021-11-28
eng
10.5281/zenodo.5887015
https://zenodo.org/communities/gre-taste
https://zenodo.org/communities/athena-rc-institute-for-language-and-speech-processing
preprint
Creative Commons Attribution 4.0 International
The aim of this study is the evaluation of an open-source OCR engine (tesseract OCR ver.4.0) by integration of a Greek dictionary with more than 500,000 words. To achieve this goal, an open access dictionary was initially used which was enriched with words that exist in the Greek restaurant menus. The training applied in the embedded LSTM deep learning model of Tesseract, before the integration of the new Greek dictionary. The evaluation of OCR performance applied with combinations of dictionaries in a total of 98 images from Greek catering menus. A slight but stable improvement of OCR performance after training and integration of the new Greek dictionary is observed at the results.