Published August 15, 2019 | Version 1.0.0
Dataset Open

Dataset of Pages from Early Printed Books with Multiple Font Groups

Description

This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list:  Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.

Note that to make downloading the dataset with slow or unreliable Internet connections easier, the dataset has been separated in several zip files. All zip files must be extracted in the same folder. The CSV files containing the labels should ideally be in the parent folder.

The labels are provided in two CSV files, one for training/tuning font group recognition methods, and the second one for evaluation purposes. Where several pages come from the same book, a special care has been taken to have all of them in the same subset.

The paper presenting this dataset in detail is "Dataset of Pages from Early Printed Books with Multiple Font Groups", accepted at the 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia.

We would like to thank the British Library (London), Bayerische Staatsbibliothek München, Staatsbibliothek zu Berlin, Universitätsbibliothek Erlangen, Universitätsbibliothek Heidelberg, Staats- und Universitäatsbibliothek Göttingen, Stadt- und Universitätsbibliothek Köln, Württembergische Landesbibliothek Stuttgart and Herzog August Bibliothek Wolfenbüttel for the data they sent us and kindly allowed us to use for this public dataset.

Files

fontgroupsdataset-a.zip

Files (44.2 GB)

Name Size Download all
md5:2fcd1cf7f4e766625ab5aaae6f10eb3e
6.3 GB Preview Download
md5:fe597b957dbbc29e5940bce21ac08f8c
6.2 GB Preview Download
md5:98d7afd41328b30afb9125708cd8bb0f
6.3 GB Preview Download
md5:3c86b0ae51fb458ad1daea3d243668f5
6.4 GB Preview Download
md5:203b8555c92b8bd2cdc4e596b2a76ef1
6.3 GB Preview Download
md5:697eb0182aca2acc839abe1008e999ba
6.5 GB Preview Download
md5:0dc8dbd31c9942d85289fce41776eb56
6.3 GB Preview Download
md5:bf98a2d56bdcb7e5c09de4c92727cc18
292.0 kB Preview Download

Additional details

Related works

Is documented by
10.1145/3352631.3352640 (DOI)

References

  • Dataset of Pages from Early Printed Books with Multiple Font Groups