Published March 3, 2026 | Version v1.0
Dataset Open

ClozeFormHandwriting Dataset

Description

ClozeFormHandwriting v1.0

This is the first public release of ClozeFormHandwriting, a dataset of English single-word handwritten cloze-form classroom responses.

What's included

  • 3,000 cropped handwritten word images (correct responses) from classroom answer sheets
  • Ground-truth transcriptions encoded in filenames: <GroupID>_<Word>.png
  • Split logs:
    • log_train.txt
    • log_test.txt
  • Documentation and usage instructions in README.md
  • License and citation files (LICENSE, CITATION.cff)

Notes

  • This release focuses on word-level recognition under low-legibility handwriting.
  • The dataset contains variation in stroke width, slant, spacing, and legibility typical of real classroom writing.
  • Please refer to the repository README for dataset structure and loading examples.

Ethics & privacy

According to the original dataset documentation, the student responses were collected under institutional permissions for research use and were fully anonymized prior to release. The released dataset contains no personally identifiable information (PII), and no writer identifiers are provided.

License

CC BY 4.0

Citation

If you use this dataset, please cite the dataset and the associated paper as described in README.md / CITATION.cff.

Notes

If you use this dataset, please cite the paper below.

@inproceedings{chandola2025far,
  title={How far are we from Automatic Grading of Handwritten Cloze Form Questions?},
  author={Chandola, Shrey and Ravikiran, Manikandan and Saluja, Rohit},
  booktitle={International Conference on Artificial Intelligence in Education},
  pages={336--343},
  year={2025},
  organization={Springer}
}

Files

Shrey0900/ClozeFormHandwriting-v1.0.zip

Files (49.7 MB)

Name Size Download all
md5:f3e02944936a1a2abd2b963dbcb655f7
49.7 MB Preview Download

Additional details

Related works