Published June 14, 2020 | Version 1.0
Dataset Open

ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments (HisFrag20) Dataset

  • 1. Friedrich-Alexander-Universität Erlangen-Nürnberg
  • 2. Institut de Recherche et d'Histoire des Textes

Description

This competition investigates the performance of large-scale retrieval of historical document fragments based on writer recognition. The analysis of historic fragments is a difficult challenge commonly solved by trained humanists.
We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as fragment or writer retrieval. Therefore, we created a large dataset consisting of more than 120000 fragments.
The goal is then to find similar patches of the same page or manuscript. contains ~100 000 fragments using the Historical-IR19 as base dataset, they should all contain some text, however some fragments are quite small.

Training-set: contains ~100 000 fragments using the Historical-IR19 as base dataset, they should all contain some text, however some fragments are quite small.

Test-set: contains about 20 000 new fragments

Naming-convention: WID_PID_FID.jpg , where WID=writer id, PID: page id, FID= fragment id

For more information visit: https://lme.tf.fau.de/research/competitions/hisfragir20/

Files

hisfrag20_test.zip

Files (4.7 GB)

Name Size Download all
md5:56ea22f6424cadb2208c1bfd171d8a8a
1.4 GB Preview Download
md5:a6c3a9a2c2f170605fcf153792dd78e8
3.2 GB Preview Download