Published September 1, 2016
| Version v1.0
Dataset
Open
MPS Data set with images of medieval charters for handwriting-style based dating of manuscripts
- 1. University of Groningen
- 2. University of Amsterdam
Description
The MPS benchmark data set for handwritten manuscript dating ____________________________________________________________ This data set is collected for the Dutch NWO project: Medieval Paleographical Scale (MPS) by Petros Samara Project website: http://application02.target.rug.nl/monk/Projects/MPS/ Copyright (c) Huygensinstituut, Den Haag, 2016 University of Groningen, 2016. All rights reserved. Organisation of the data: Each .tar.gz file contains a number of NetPBM images. The format is chosen because of its simplicity. Also, there is no doubt about lossy compression in the processing chain. The file names are of the format 'MPS<year>_<seqnr>.ppm', for example, 'MPS1300_0056.ppm'. Note: the files are not in a separate directory, they will be extracted in place. However, due to the unique naming, there is no problem extracting them in one single current (destination) directory. The actual type of the image can be gray scale (.pgm) or color (.ppm), in '8-bit DirectClass' according to ImageMagick's 'identify' tool. The images were cropped out of larger photographs because of irrelevant elements such as a Kodak color calibrator and non-text content such as supporting surface (table) backgrounds, seals (emblems), ribbons, etc. No effort has been made to obtain a balanced set of samples over years: the given frequencies of occurrence in archives are used. There is evidently less data in years before 1375 A.D. while some periods provides us with ample data for historical reasons (e.g, 1450 A.D.). It would have been a pity if the scarce years had determined and limited the size of this data set. Selection criteria for data reduction, whether random or systematic, would have been arbitrary. In any case, these images were used in our publications, such that the performance results of future attempts on manuscript dating can be compared with earlier results. The performances that have been reached using our algorithms are in the order of an MAE (mean average error) of 10 years. If you have any questions, please contact us: Sheng He (heshengxgd@gmail.com) Petros Samara (petros.samara@huygens.knaw.nl) Jan Burgers (jan.burgers@huygens.knaw.nl) Lambert Schomaker (L.Schomaker@ai.rug.nl) Please cite our papers if you use this data set: [1] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognition(PR), Vol. 59, pp. 159-171, 2016 [2] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Towards style-based dating of historical documents. International Conference on Frontiers in Handwriting Recognition(ICFHR), Crete, Greece, 2014 [3] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization IEEE Trans. on Image Processing, Vol. 25(11), Nov. 2016. http://ieeexplore.ieee.org/document/7551181/
Data are collected thanks to Dutch NWO grant project 380-50-006
Notes
Files
Files
(19.0 GB)
Name | Size | Download all |
---|---|---|
md5:333c871fa3b4a15c05394f22850cb68f
|
19.0 GB | Download |
Additional details
References
- Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognition(PR), Vol. 59, pp. 159-171, 2016
- Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Towards style-based dating of historical documents. International Conference on Frontiers in Handwriting Recognition(ICFHR), Crete, Greece, 2014
- Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization IEEE Trans. on Image Processing, Vol. 25(11), Nov. 2016. http://ieeexplore.ieee.org/document/7551181/