Dataset Open Access


Ahmad Droby; Daria Vasyutinsky Shapira; Irina Rabaev; Berat Kurar Bakarat; Jihad El-Sana

The VML-HP-ext collection contains 715 page images excerpted from 171 different manuscripts covering 14 medieval writing Hebrew styles, accompanied by their hard and soft GT labels. 

 We also provide the official split of the VML-HP-ext into training, typical test, and blind test sets. 
The typical test set includes unseen pages of the manuscripts from the training set. While training and typical test sets are disjoint on the page level, they do share the same set of manuscripts. Therefore, we also provide the blind test set, which consists of manuscripts that do not appear in the training set. The blind test set imitates a real-life scenario, where scholars would like to obtain a classification for a previously unseen document.

In this version, several labels were corrected. 

For more details, please refer to [1]. When using this dataset in research work, please cite [1]. [1] A.Droby, D. Vasyutinsky Shapira, I. Rabaev, B. Kurar Barakat, and J. El-Sana. Hard and Soft Labeling for Hebrew Paleography: A Case Study. Accepted to the 15th IAPR International Workshop on Document Analysis System (
Files (2.2 GB)
Name Size
2.2 GB Download
All versions This version
Views 9928
Downloads 2413
Data volume 51.6 GB28.0 GB
Unique views 8227
Unique downloads 199


Cite as