Tesseract OCR Evaluation on Greek Food Menus Datasets

doi:10.12988/ijco.2022.9829

Published January 1, 2022 | Version v1

Journal article Open

Tesseract OCR Evaluation on Greek Food Menus Datasets

1. International Hellenic University
2. ATHENA - Research and Innovation Centre in Information, Communication and Knowledge Technologies

This article presents a procedure for optical character recognition (OCR) improvement, after image preprocessing of Greek food menus images. To achieve this goal, many well-known and other more so- phisticated techniques for image preprocessing have been used. The performance of the Tesseract OCR engine has been studied for selected binarization, thresholding, noise and morphological filtering methods that applied to menu images before OCR feeding. The output text is compared to the reference text of each image (ground text) and the val- ues of evaluation indices indicate the appropriate preprocessing method. Datasets of Greek food menu images with their respective ground text files, were generated for first time in this study, due to the lack of alter- native datasets in any language. OCR outputs and ground texts were evaluated using error rate and accuracy on character and word levels. The results of OCR application on Greek menu images showed high ac- curacy values in high scanning resolution photos and in cases of menus with distinct and visible fonts.

Files

varsamisIJCO1-2022.pdf

Files (1.3 MB)

Name	Size	Download all
varsamisIJCO1-2022.pdf md5:b46ecc1d39dac9d2cbbdd2b4d41ad816	1.3 MB	Preview Download

	All versions	This version
Views	91	83
Downloads	103	96
Data volume	141.5 MB	132.4 MB

Tesseract OCR Evaluation on Greek Food Menus Datasets

Creators

Description

Files

varsamisIJCO1-2022.pdf

Files (1.3 MB)