Published September 1, 2016 | Version v1.0
Dataset Open

MPS Data set with images of medieval charters for handwriting-style based dating of manuscripts

  • 1. University of Groningen
  • 2. University of Amsterdam

Description

The MPS benchmark data set for handwritten manuscript dating
____________________________________________________________

This data set is collected for the Dutch NWO project:

Medieval Paleographical Scale (MPS)

by Petros Samara

Project website: http://application02.target.rug.nl/monk/Projects/MPS/

Copyright (c) Huygensinstituut, Den Haag, 2016
              University of Groningen, 2016.
              All rights reserved.
              
Organisation of the data: Each .tar.gz file contains a number of NetPBM
images. The format is chosen because of its simplicity. Also,
there is no doubt about lossy compression in the processing chain. The file 
names are of the format 'MPS<year>_<seqnr>.ppm', for example, 'MPS1300_0056.ppm'.
Note: the files are not in a separate directory, they will be extracted in place. 
However, due to the unique naming, there is no problem extracting them in one
single current (destination) directory.

The actual type of the image can be gray scale (.pgm) or color (.ppm),
in '8-bit DirectClass' according to ImageMagick's 'identify' tool.
The images were cropped out of larger photographs because of irrelevant 
elements such as a Kodak color calibrator and non-text content such as supporting
surface (table) backgrounds, seals (emblems), ribbons, etc.

No effort has been made to obtain a balanced set of samples over years:
the given frequencies of occurrence in archives are used.
There is evidently less data in years before 1375 A.D. while some periods
provides us with ample data for historical reasons (e.g, 1450 A.D.). It 
would have been a pity if the scarce years had determined and limited the 
size of this data set. Selection criteria for data reduction, whether random 
or systematic, would have been arbitrary. In any case, these images were
used in our publications, such that the performance results of 
future attempts on manuscript dating can be compared with earlier results.

The performances that have been reached using our algorithms are in
the order of an MAE (mean average error) of 10 years.

If you have any questions, please contact us:

Sheng He (heshengxgd@gmail.com)    
Petros Samara (petros.samara@huygens.knaw.nl)
Jan Burgers (jan.burgers@huygens.knaw.nl)
Lambert Schomaker (L.Schomaker@ai.rug.nl)         


Please cite our papers if you use this data set:

[1] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.
	Image-based historical manuscript dating using contour and stroke fragments. 
	Pattern Recognition(PR), Vol. 59, pp. 159-171, 2016	
	
[2] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.
	Towards style-based dating of historical documents.
	International Conference on Frontiers in Handwriting Recognition(ICFHR), Crete, Greece, 2014
	
[3] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.	
	Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization
	IEEE Trans. on Image Processing, Vol. 25(11), Nov. 2016. 
        http://ieeexplore.ieee.org/document/7551181/

Data are collected thanks to Dutch NWO grant project 380-50-006

Notes

S.He@rug.nl L.R.B. Schomaker@rug.nl

Files

Files (19.0 GB)

Name Size Download all
md5:333c871fa3b4a15c05394f22850cb68f
19.0 GB Download

Additional details

References

  • Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognition(PR), Vol. 59, pp. 159-171, 2016
  • Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Towards style-based dating of historical documents. International Conference on Frontiers in Handwriting Recognition(ICFHR), Crete, Greece, 2014
  • Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker. Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization IEEE Trans. on Image Processing, Vol. 25(11), Nov. 2016. http://ieeexplore.ieee.org/document/7551181/