Published June 25, 2025
| Version 2.0.0
Dataset
Open
Handwriting Adaptation Dataset
Description
27 manuscripts in various European languages and scripts.
More information together with adaptation fine-tuning experiments of a general model trained on a large handwriting dataset can be found here: Finetuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition (CTC based models, only first 19 writers are used) and Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts (Transformer based models).
There are two directories in the dataset archive: data and runs.
data contains images of text lines and their respective transcriptions. The images are in two crop modes: orig and wide, the crop mode indicates how much space was left around the baseline during the cropping process. Transcriptions are in the following format: ID TRANS, where the ID corresponds to the name of the respective text line image and TRANS is the transcription.
runs contains partitions for fine-tuning runs, more information in the referenced paper (Section 5 and Section 4, respectively).
Files
handwritting_adaptation_dataset_v2.zip
Files
(3.5 GB)
Name | Size | Download all |
---|---|---|
md5:8750af4fc23d1fcad73d64869b343520
|
3.5 GB | Preview Download |
Additional details
Funding
- Ministry of Culture
- NAKI III project semANT - Semantic Document Exploration DH23P03OVV060