Published June 3, 2026 | Version v1
Dataset Open

Handwritten Document Dataset Splits

  • 1. ROR icon Luleå University of Technology
  • 2. EDMO icon Lulea University of Technology

Description

NorHAND-mini is a randomly sampled subset of the original NorHAND dataset created for controlled experiments in handwritten text recognition.

The subset contains:

- Training: 350 pages (10,490 lines)

- Validation: 50 pages (1,576 lines)

- Test: 50 pages (1,491 lines)

Total:

- 450 pages

- 13,557 text lines

The split was generated using random page-level sampling to preserve writer and page integrity. The exact train/validation/test identifiers used in our experiments are provided to ensure full reproducibility.

This release contains only the split definitions and sample identifiers. Original images and annotations remain subject to the licensing and distribution terms of the NorHAND dataset.

Files

splits.zip

Files (2.6 kB)

Name Size Download all
md5:3c1657318b04fbf3aea2add7db5a3340
2.6 kB Preview Download