Published April 18, 2019 | Version 0.1.0
Dataset Open

Handwritten Species Names Data

Creators

  • 1. Leiden Institute of Advanced Computer Science

Description

This dataset is part of a paper presented at the 41st European Conference on Information Retrieval ,14th – 18th April 2019, in Cologne. DOI: https://doi.org/10.1007/978-3-030-15712-8_43

Data summary: Word images from 240 field notes from a natural history collection have been segmented and semantically annotated. This has been carried out in the context of the project ''Making Sense of Illustrated Handwritten Archives'', http://www.makingsenseproject.org/. From a field book on mammals, field notes from four different writers have been selected, to account for different handwriting styles and structures. The segmented word images were obtained from a nichesourcing effort, with the help of a group of domain expert labellers and a handwriting recognition system MONK, developed by Lambert Schomaker. The word images were subsequently manually annotated using four classes: Genus (0), Species(1), Author(2) and Other (3). 

Dataframe fields:  rel_xc (relative centroid x coordinate), rel_yc (relative centroid y coordinate), page (identifier of field book page), rel_x1 (relative left x coordinate bounding box), rel_x2 (relative right x coordinate bounding box), rel_y1 (relative upper y coordinate bounding box), rel_y2 (relative lower y coordinate bounding box), type (class label, 0-3), image (pixels of word image), height (height word image), size (height * width word image), width (width word image). 

 

Files

Files (219.9 MB)

Name Size Download all
md5:f8f5928b5fbfbf4c49c0e318a1ea1cb8
219.9 MB Download