Quality Measurement for Optical Character Recognition without ground truth data

doi:10.5281/zenodo.3722922

Published March 9, 2020 | Version v1

Other Open

Quality Measurement for Optical Character Recognition without ground truth data

Weltevrede, Mike

This document notes most of the research I had done for the National Library of the Netherlands (Koninklijke Bibliotheek) on a project for my Master Thesis. Despite terminating the project due to a misalignment with my study program, it is useful to consider the research conducted so far.

The purpose of the project was to measure the quality of documents processed with OCR by ABBYY FineReader independent of ABBYY's own reports and independent of ground truth data, given that for many documents this will not be available in the future.

Files

notes.pdf

Files (221.8 kB)

Name	Size	Download all
notes.pdf md5:d35dbe7f9aeaa16f87c94a710e691f54	221.8 kB	Preview Download

Additional details

Feng, M.-L., & Tan, Y.-P. (2004). Contrast adaptive binarization of low quality document images. IEICE Electronics Express, 1 (16), 501{506.
Kulp, S., & Kontostathis, A. (2007). On retrieving legal les: Shortening documents and weeding out garbage. In Trec.
Rennie, J. D., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. In Proceedings of the ijcai multidisciplinary workshop on advances in preference handling (Vol. 1).
Wudtke, R., Ringlstetter, C., & Schulz, K. U. (2011). Recognizing garbage in ocr output on historical documents. In Proceedings of the 2011 joint workshop on multilingual ocr and analytics for noisy unstructured text data (pp. 1-6).

	All versions	This version
Views	51	51
Downloads	57	57
Data volume	14.0 MB	14.0 MB

Quality Measurement for Optical Character Recognition without ground truth data

Creators

Description

Files

notes.pdf

Files (221.8 kB)

Additional details

References