Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published July 2, 2019 | Version v1
Journal article Open

Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata

  • 1. The Natural History Museum, London, United Kingdom|The Natural History Museum, London, United Kingdom
  • 2. Royal Botanic Garden Edinburgh, Edinburgh, United Kingdom

Description

Capturing data from specimen images is the most viable way of enriching specimen metadata cheaply and quickly compared to traditional digitisation. Advances in machine learning and computer vision-based tools, and their increasing accessibility and affordability, are greatly increasing the potential to take automated measurements and capture other data from specimens themselves, as well as to transcribe label data.

More sophisticated segmentation of images allows us to find parts of interest: particular labels; individual specimens on a slide; or barcodes. Following segmentation, there is the potential to use colour analysis of specimens to perform conditional checking, such as looking for bad cases of verdigris in pinned insects or discoloration of gum-chloral mountant. Automating measurements and landmark analysis of specimens can be used to create trait datasets, all of which will enrich our knowledge of specimens. Segmentation of labels can allow us to cluster similar labels based on their visual properties including colour, shape and patterns—this in turn can be used to make optical character recognition, handwriting recognition and manual transcription much more efficient. Atomising, validating and resolving label data will create structured label data that can be more easily stored, searched and linked to other datasets.

We present a landscape analysis on the approaches, summarising previous work, and outline our plan to build future tools and systems in the SYNTHESYS+ Project as part of the Specimen Data Refinery. This will cover the sharing of tools, reducing barriers to access, integrating workflow engines into a software architecture that allows the components to be re-used and re-purposed with provenance data for repeatability, and conforms with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles (Wilkinson et al. 2016).

Files

BISS_article_37647.pdf

Files (80.9 kB)

Name Size Download all
md5:5821e46783b5a89986036589c3f3d919
63.5 kB Preview Download
md5:8604c71e710d4c332bcc537e8cb0e4eb
17.3 kB Preview Download

Linked records