Journal article Open Access

Tesseract OCR Evaluation on Greek Food Menus Datasets

Tsimpiris, Alkiviadis; Varsamis, Dimitrios; Pavlidis, George

This article presents a procedure for optical character recognition (OCR) improvement, after image preprocessing of Greek food menus images. To achieve this goal, many well-known and other more so- phisticated techniques for image preprocessing have been used. The performance of the Tesseract OCR engine has been studied for selected binarization, thresholding, noise and morphological filtering methods that applied to menu images before OCR feeding. The output text is compared to the reference text of each image (ground text) and the val- ues of evaluation indices indicate the appropriate preprocessing method. Datasets of Greek food menu images with their respective ground text files, were generated for first time in this study, due to the lack of alter- native datasets in any language. OCR outputs and ground texts were evaluated using error rate and accuracy on character and word levels. The results of OCR application on Greek menu images showed high ac- curacy values in high scanning resolution photos and in cases of menus with distinct and visible fonts.

Files (1.3 MB)
Name Size
1.3 MB Download
Views 18
Downloads 17
Data volume 22.1 MB
Unique views 15
Unique downloads 15


Cite as