Handwritten Species Names Data

Lise Stork

doi:10.5281/zenodo.2545573

Published April 18, 2019 | Version 0.1.0

Dataset Open

Handwritten Species Names Data

Lise Stork¹

1. Leiden Institute of Advanced Computer Science

This dataset is part of a paper presented at the 41st European Conference on Information Retrieval ,14th – 18th April 2019, in Cologne. DOI: https://doi.org/10.1007/978-3-030-15712-8_43

Data summary: Word images from 240 field notes from a natural history collection have been segmented and semantically annotated. This has been carried out in the context of the project ''Making Sense of Illustrated Handwritten Archives'', http://www.makingsenseproject.org/. From a field book on mammals, field notes from four different writers have been selected, to account for different handwriting styles and structures. The segmented word images were obtained from a nichesourcing effort, with the help of a group of domain expert labellers and a handwriting recognition system MONK, developed by Lambert Schomaker. The word images were subsequently manually annotated using four classes: Genus (0), Species(1), Author(2) and Other (3).

Dataframe fields: rel_xc (relative centroid x coordinate), rel_yc (relative centroid y coordinate), page (identifier of field book page), rel_x1 (relative left x coordinate bounding box), rel_x2 (relative right x coordinate bounding box), rel_y1 (relative upper y coordinate bounding box), rel_y2 (relative lower y coordinate bounding box), type (class label, 0-3), image (pixels of word image), height (height word image), size (height * width word image), width (width word image).

Files

Files (219.9 MB)

Name	Size	Download all
asa-species-data.pkl md5:f8f5928b5fbfbf4c49c0e318a1ea1cb8	219.9 MB	Download

	All versions	This version
Views	201	199
Downloads	31	31
Data volume	7.0 GB	7.0 GB

Handwritten Species Names Data

Creators

Description

Files

Files (219.9 MB)