Handwritten Species Names Data
Description
This dataset is part of a paper presented at the 41st European Conference on Information Retrieval ,14th – 18th April 2019, in Cologne. DOI: https://doi.org/10.1007/978-3-030-15712-8_43
Data summary: Word images from 240 field notes from a natural history collection have been segmented and semantically annotated. This has been carried out in the context of the project ''Making Sense of Illustrated Handwritten Archives'', http://www.makingsenseproject.org/. From a field book on mammals, field notes from four different writers have been selected, to account for different handwriting styles and structures. The segmented word images were obtained from a nichesourcing effort, with the help of a group of domain expert labellers and a handwriting recognition system MONK, developed by Lambert Schomaker. The word images were subsequently manually annotated using four classes: Genus (0), Species(1), Author(2) and Other (3).
Dataframe fields: rel_xc (relative centroid x coordinate), rel_yc (relative centroid y coordinate), page (identifier of field book page), rel_x1 (relative left x coordinate bounding box), rel_x2 (relative right x coordinate bounding box), rel_y1 (relative upper y coordinate bounding box), rel_y2 (relative lower y coordinate bounding box), type (class label, 0-3), image (pixels of word image), height (height word image), size (height * width word image), width (width word image).
Files
Files
(219.9 MB)
Name | Size | Download all |
---|---|---|
md5:f8f5928b5fbfbf4c49c0e318a1ea1cb8
|
219.9 MB | Download |