Handwriting Adaptation Dataset

Kohút, Jan

doi:10.5281/zenodo.15737665

Published June 25, 2025 | Version 2.0.0

Dataset Open

Handwriting Adaptation Dataset

Kohút, Jan¹

1. Brno University of Technology

27 manuscripts in various European languages and scripts.

More information together with adaptation fine-tuning experiments of a general model trained on a large handwriting dataset can be found here: Finetuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition (CTC based models, only first 19 writers are used) and Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts (Transformer based models).

There are two directories in the dataset archive: data and runs.

data contains images of text lines and their respective transcriptions. The images are in two crop modes: orig and wide, the crop mode indicates how much space was left around the baseline during the cropping process. Transcriptions are in the following format: ID TRANS, where the ID corresponds to the name of the respective text line image and TRANS is the transcription.

runs contains partitions for fine-tuning runs, more information in the referenced paper (Section 5 and Section 4, respectively).

Files

handwritting_adaptation_dataset_v2.zip

Files (3.5 GB)

Name	Size	Download all
handwritting_adaptation_dataset_v2.zip md5:8750af4fc23d1fcad73d64869b343520	3.5 GB	Preview Download

Additional details

Ministry of Culture
NAKI III project semANT - Semantic Document Exploration DH23P03OVV060

	All versions	This version
Views	74	39
Downloads	27	11
Data volume	90.0 GB	39.0 GB

Handwriting Adaptation Dataset

Creators

Description

Files

handwritting_adaptation_dataset_v2.zip

Files (3.5 GB)

Additional details

Funding