Published March 27, 2022 | Version v2
Dataset Open

VMP_HP-ext_2

  • 1. Ben-Gurion University of the Negev, Beer-Sheva, Israel
  • 2. Shamoon College of Engineering, Beer Sheva, 84100, Israel

Description

The VML-HP-ext collection contains 715 page images excerpted from 171 different manuscripts covering 14 medieval writing Hebrew styles, accompanied by their hard and soft GT labels. 

 We also provide the official split of the VML-HP-ext into training, typical test, and blind test sets. 
The typical test set includes unseen pages of the manuscripts from the training set. While training and typical test sets are disjoint on the page level, they do share the same set of manuscripts. Therefore, we also provide the blind test set, which consists of manuscripts that do not appear in the training set. The blind test set imitates a real-life scenario, where scholars would like to obtain a classification for a previously unseen document.

In this version, several labels were corrected. 

Notes

For more details, please refer to [1]. When using this dataset in research work, please cite [1]. [1] A.Droby, D. Vasyutinsky Shapira, I. Rabaev, B. Kurar Barakat, and J. El-Sana. Hard and Soft Labeling for Hebrew Paleography: A Case Study. Accepted to the 15th IAPR International Workshop on Document Analysis System (https://das2022.univ-lr.fr/index.php/)

Files

VMP_HP_ext_v2.zip

Files (2.2 GB)

Name Size Download all
md5:c9114f5e91db7793ba602db7ed53c2c9
2.2 GB Preview Download