Published January 1, 2022 | Version v1
Journal article Open

Tesseract OCR Evaluation on Greek Food Menus Datasets

  • 1. International Hellenic University
  • 2. ATHENA - Research and Innovation Centre in Information, Communication and Knowledge Technologies

Description

This article presents a procedure for optical character recognition (OCR) improvement, after image preprocessing of Greek food menus images. To achieve this goal, many well-known and other more so- phisticated techniques for image preprocessing have been used. The performance of the Tesseract OCR engine has been studied for selected binarization, thresholding, noise and morphological filtering methods that applied to menu images before OCR feeding. The output text is compared to the reference text of each image (ground text) and the val- ues of evaluation indices indicate the appropriate preprocessing method. Datasets of Greek food menu images with their respective ground text files, were generated for first time in this study, due to the lack of alter- native datasets in any language. OCR outputs and ground texts were evaluated using error rate and accuracy on character and word levels. The results of OCR application on Greek menu images showed high ac- curacy values in high scanning resolution photos and in cases of menus with distinct and visible fonts.

Files

varsamisIJCO1-2022.pdf

Files (1.3 MB)

Name Size Download all
md5:b46ecc1d39dac9d2cbbdd2b4d41ad816
1.3 MB Preview Download