ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments (HisFrag20) Dataset

Seuret, Mathias; Nicolaou, Anguelos; Stutzmann, Dominique; Maier, Andreas; Christlein, Vincent

doi:10.5281/zenodo.3893807

Published June 14, 2020 | Version 1.0

Dataset Open

ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments (HisFrag20) Dataset

1. Friedrich-Alexander-Universität Erlangen-Nürnberg
2. Institut de Recherche et d'Histoire des Textes

This competition investigates the performance of large-scale retrieval of historical document fragments based on writer recognition. The analysis of historic fragments is a difficult challenge commonly solved by trained humanists.
We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as fragment or writer retrieval. Therefore, we created a large dataset consisting of more than 120000 fragments.
The goal is then to find similar patches of the same page or manuscript. contains ~100 000 fragments using the Historical-IR19 as base dataset, they should all contain some text, however some fragments are quite small.

Training-set: contains ~100 000 fragments using the Historical-IR19 as base dataset, they should all contain some text, however some fragments are quite small.

Test-set: contains about 20 000 new fragments

Naming-convention: WID_PID_FID.jpg , where WID=writer id, PID: page id, FID= fragment id

For more information visit: https://lme.tf.fau.de/research/competitions/hisfragir20/

Files

hisfrag20_test.zip

Files (4.7 GB)

Name	Size	Download all
hisfrag20_test.zip md5:56ea22f6424cadb2208c1bfd171d8a8a	1.4 GB	Preview Download
hisfrag20_train.zip md5:a6c3a9a2c2f170605fcf153792dd78e8	3.2 GB	Preview Download

	All versions	This version
Views	1,125	1,119
Downloads	275	275
Data volume	1.9 TB	1.9 TB

ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments (HisFrag20) Dataset

Creators

Description

Files

hisfrag20_test.zip

Files (4.7 GB)