Published June 25, 2025 | Version 2.0.0
Dataset Open

Handwriting Adaptation Dataset

  • 1. ROR icon Brno University of Technology

Description

27 manuscripts in various European languages and scripts.

 
More information together with adaptation fine-tuning experiments of a general model trained on a large handwriting dataset can be found here: Finetuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition (CTC based models, only first 19 writers are used) and Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts (Transformer based models).
 
There are two directories in the dataset archive: data and runs.

data contains images of text lines and their respective transcriptions. The images are in two crop modes: orig and wide, the crop mode indicates how much space was left around the baseline during the cropping process. Transcriptions are in the following format: ID TRANS, where the ID corresponds to the name of the respective text line image and TRANS is the transcription.

runs contains partitions for fine-tuning runs, more information in the referenced paper (Section 5 and Section 4, respectively).

Files

handwritting_adaptation_dataset_v2.zip

Files (3.5 GB)

Name Size Download all
md5:8750af4fc23d1fcad73d64869b343520
3.5 GB Preview Download

Additional details

Funding

Ministry of Culture
NAKI III project semANT - Semantic Document Exploration DH23P03OVV060