Dataset of Pages from Early Printed Books with Multiple Font Groups
Creators
- 1. Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
- 2. Gutenberg-Institut für Weltliteratur und schriftorientierte Medien Abteilung Buchwissenschaft
Contributors
Data collectors:
Data managers:
Description
This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.
Note that to make downloading the dataset with slow or unreliable Internet connections easier, the dataset has been separated in several zip files. All zip files must be extracted in the same folder. The CSV files containing the labels should ideally be in the parent folder.
The labels are provided in two CSV files, one for training/tuning font group recognition methods, and the second one for evaluation purposes. Where several pages come from the same book, a special care has been taken to have all of them in the same subset.
The paper presenting this dataset in detail is "Dataset of Pages from Early Printed Books with Multiple Font Groups", accepted at the 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia.
We would like to thank the British Library (London), Bayerische Staatsbibliothek München, Staatsbibliothek zu Berlin, Universitätsbibliothek Erlangen, Universitätsbibliothek Heidelberg, Staats- und Universitäatsbibliothek Göttingen, Stadt- und Universitätsbibliothek Köln, Württembergische Landesbibliothek Stuttgart and Herzog August Bibliothek Wolfenbüttel for the data they sent us and kindly allowed us to use for this public dataset.
Files
fontgroupsdataset-a.zip
Files
(44.2 GB)
Name | Size | Download all |
---|---|---|
md5:2fcd1cf7f4e766625ab5aaae6f10eb3e
|
6.3 GB | Preview Download |
md5:fe597b957dbbc29e5940bce21ac08f8c
|
6.2 GB | Preview Download |
md5:98d7afd41328b30afb9125708cd8bb0f
|
6.3 GB | Preview Download |
md5:3c86b0ae51fb458ad1daea3d243668f5
|
6.4 GB | Preview Download |
md5:203b8555c92b8bd2cdc4e596b2a76ef1
|
6.3 GB | Preview Download |
md5:697eb0182aca2acc839abe1008e999ba
|
6.5 GB | Preview Download |
md5:0dc8dbd31c9942d85289fce41776eb56
|
6.3 GB | Preview Download |
md5:bf98a2d56bdcb7e5c09de4c92727cc18
|
292.0 kB | Preview Download |
Additional details
Related works
- Is documented by
- 10.1145/3352631.3352640 (DOI)
References
- Dataset of Pages from Early Printed Books with Multiple Font Groups