Published May 10, 2021 | Version 1.0
Dataset Open

Handwritten Text Recognition Test Set: Minutes of the Swiss Federal Council (1848-1903)

  • 1. Digital Humanities/Walter Benjamin Kolleg/Universität Bern

Description

This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition.

The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes.

For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1#1000/9#1-215. The images are in the public domain.

A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line.

For PageXML see Pletschacher, S., & Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) Format Framework. 257–260. https://doi.org/10.1109/ICPR.2010.72.

Files

page.zip

Files (220.9 MB)

Name Size Download all
md5:2c0f881c06480594d95f6f686e758c20
216.2 MB Download
md5:4df0dfcd78065fe9bc687ec3f02c9219
4.7 MB Preview Download