Handwritten Document Dataset Splits

barney, Elisa; Liwicki, Macrus

doi:10.5281/zenodo.20524615

Published June 3, 2026 | Version v1

Dataset Open

Handwritten Document Dataset Splits

1. Luleå University of Technology
2. Lulea University of Technology

NorHAND-mini is a randomly sampled subset of the original NorHAND dataset created for controlled experiments in handwritten text recognition.

The subset contains:

- Training: 350 pages (10,490 lines)

- Validation: 50 pages (1,576 lines)

- Test: 50 pages (1,491 lines)

Total:

- 450 pages

- 13,557 text lines

The split was generated using random page-level sampling to preserve writer and page integrity. The exact train/validation/test identifiers used in our experiments are provided to ensure full reproducibility.

This release contains only the split definitions and sample identifiers. Original images and annotations remain subject to the licensing and distribution terms of the NorHAND dataset.

Files

splits.zip

Files (2.6 kB)

Name	Size	Download all
splits.zip md5:3c1657318b04fbf3aea2add7db5a3340	2.6 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	19	19
Downloads	0	0
Data volume	0 Bytes	0 Bytes

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 3, 2026
Modified: June 3, 2026

Handwritten Document Dataset Splits

Authors/Creators

Description

Files

splits.zip

Files (2.6 kB)