READ dataset Bozen

Sánchez, Joan Andreu; Romero, Verónica; Toselli, Alejandro H.; Vidal, Enrique

doi:10.5281/zenodo.218236

Published December 22, 2016 | Version v1

Dataset Open

READ dataset Bozen

1. Pattern Recognition and Human Language Technologies

This dataset arises from the READ project (Horizon 2020).

The dataset consists of a subset of documents from the Ratsprotokolle collection composed of minutes of the council meetings held from 1470 to 1805 (about 30.000 pages), which will be used in the READ project. This dataset is written in Early Modern German. The number of writers is unknown. Handwriting in this collection is complex enough to challenge the HTR software.

The training dataset is composed of 400 pages; most of the pages consist of a single block with many difficulties for line detection and extraction. The ground-truth in this set is in PAGE format and it is provided annotated at line level in the PAGE files.

Files

Files (493.2 MB)

Name	Size	Download all
PublicData.tgz md5:3e7f116ab365098426005fa35b889832	493.2 MB	Download

Views

862

Downloads

Show more details

	All versions	This version
Views	3,457	3,445
Downloads	862	859
Data volume	990.3 GB	988.8 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 22, 2016
Modified: January 24, 2020

READ dataset Bozen

Creators

Description

Files

Files (493.2 MB)