There is a newer version of the record available.

Published March 27, 2022 | Version v1
Dataset Open


  • 1. Ben-Gurion University of the Negev, Beer-Sheva, Israel
  • 2. Shamoon College of Engineering, Beer Sheva, 84100, Israel


The VML-HP-ext collection contains 715 page images excerpted from 171 different manuscripts covering 14 medieval writing Hebrew styles, accompanied by their hard and soft GT labels. 

 We also provide the official split of the VML-HP-ext into training, typical test, and blind test sets. 
The typical test set includes unseen pages of the manuscripts from the training set. While training and typical test sets are disjoint on the page level, they do share the same set of manuscripts. Therefore, we also provide the blind test set, which consists of manuscripts that do not appear in the training set. The blind test set imitates a real-life scenario, where scholars would like to obtain a classification for a previously unseen document.


For more details, please refer to [1]. When using this dataset in research work, please cite [1]. [1] A.Droby, D. Vasyutinsky Shapira, I. Rabaev, B. Kurar Barakat, and J. El-Sana. Hard and Soft Labeling for Hebrew Paleography: A Case Study. Accepted to the 15th IAPR International Workshop on Document Analysis System (


Files (2.1 GB)

Name Size Download all
2.1 GB Preview Download