Published November 22, 2018 | Version Version 1.0
Poster Open

Ocromore - Combining multiple OCR-engine results to improve character recognition accuracy

  • 1. UB-Mannheim

Description

One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer and
store it in a structured manner in a database. The Aktienführer is a reference work published annually between 1956-1999 as print book comprising data for companies listed at stock exchanges in Germany.
A high character recognition accuracy is crucial for structure recognition and further analyses of the OCR-output.
To optimize the OCR quality, "Ocromore" was developed.
It is a toolset for combining multiple OCR-outputs.
The best combined result is achieved with an word-wise character confidence-based multi sequence alignment (msa) approach. Our results show an character accuracy increase of 0,49% and an error reduction of 33% compared to the best single result.

Files

Ocromore.pdf

Files (7.7 MB)

Name Size Download all
md5:7baa1a8e448c7a6704e50ccade565d58
7.7 MB Preview Download