A benchmark dataset of herbarium specimen images with label data

New upload

A benchmark dataset of herbarium specimen images with label data

To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding transcribed data. These images originate from nine different collections and include specimens that reflect the multiple potential obstacles that transcription methods may encounter, such as differences in language, text format (printed or handwritten), specimen age and nomenclatural type status. We are making these specimens available with a Creative Commons Zero licence waiver and with permanent online storage of the data. By doing this we are minimizing the  obstacles to the use of these images for transcription training. This benchmark dataset of images may also be used where a defined and documented set of herbarium specimens is needed, such as for the extraction of morphological traits, handwriting recognition and colour analysis of specimens.


Curated by:
MathiasDillen
Curation policy:
Not specified
Created:
November 7, 2018
Harvesting API:
OAI-PMH Interface

Want your upload to appear in this community?

  • Click the button above to upload a record directly to this community.
    To add one of your existing records to the community, edit the record, add this community under the "Communities" section, save, and finally publish.
  • The community curator will then be notified to either accept or reject your upload (see community curation policy below).
  • If your upload is rejected by the curator, it will still be available on Zenodo, just not in this community.