Bittmann, Janina
Duntze, Oliver
Hinrichsen, Lena
Hoppe, Leonie
Hosfeld, Maria
Lieneke, Lukas
Limbach, Saskia
Meier, Annette
Menz, Lennart
Schmidt, Christian
Stumm, Magdalena
Weichselbaumer, Nikolaus
Wiechmann, Eileen
Seuret, Mathas
Limbach, Saskia
Reske, Christoph
Weichselbaumer, Nikolaus
Seuret, Mathias
Limbach, Saskia
Weichselbaumer, Nikolaus
Maier, Andreas
Christlein, Vincent
2019-08-15
<p>This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.</p>
<p>Note that to make downloading the dataset with slow or unreliable Internet connections easier, the dataset has been separated in several zip files. All zip files must be extracted in the same folder. The CSV files containing the labels should ideally be in the parent folder.</p>
<p>The labels are provided in two CSV files, one for training/tuning font group recognition methods, and the second one for evaluation purposes. Where several pages come from the same book, a special care has been taken to have all of them in the same subset.</p>
<p>The paper presenting this dataset in detail is "Dataset of Pages from Early Printed Books with Multiple Font Groups", accepted at the 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia.</p>
<p>We would like to thank the British Library (London), Bayerische Staatsbibliothek München, Staatsbibliothek zu Berlin, Universitätsbibliothek Erlangen, Universitätsbibliothek Heidelberg, Staats- und Universitäatsbibliothek Göttingen, Stadt- und Universitätsbibliothek Köln, Württembergische Landesbibliothek Stuttgart and Herzog August Bibliothek Wolfenbüttel for the data they sent us and kindly allowed us to use for this public dataset.</p>
https://doi.org/10.5281/zenodo.3366686
oai:zenodo.org:3366686
eng
Zenodo
https://doi.org/10.1145/3352631.3352640
https://zenodo.org/communities/iapr-tc11
https://doi.org/10.5281/zenodo.3366685
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial Share Alike 4.0 International
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
HIP'19, 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia, 20-21 September 2019
digital humanities
historical documents
document analysis
incunabula
type
typography
fonts
Antiqua
Italic
Textura
Rotunda
Gotico-Antiqua
Bastarda
Schwabacher
Fraktur
Dataset of Pages from Early Printed Books with Multiple Font Groups
info:eu-repo/semantics/other