Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata
Creators
- 1. The Natural History Museum, London, United Kingdom|The Natural History Museum, London, United Kingdom
- 2. Royal Botanic Garden Edinburgh, Edinburgh, United Kingdom
Description
Capturing data from specimen images is the most viable way of enriching specimen metadata cheaply and quickly compared to traditional digitisation. Advances in machine learning and computer vision-based tools, and their increasing accessibility and affordability, are greatly increasing the potential to take automated measurements and capture other data from specimens themselves, as well as to transcribe label data.
More sophisticated segmentation of images allows us to find parts of interest: particular labels; individual specimens on a slide; or barcodes. Following segmentation, there is the potential to use colour analysis of specimens to perform conditional checking, such as looking for bad cases of verdigris in pinned insects or discoloration of gum-chloral mountant. Automating measurements and landmark analysis of specimens can be used to create trait datasets, all of which will enrich our knowledge of specimens. Segmentation of labels can allow us to cluster similar labels based on their visual properties including colour, shape and patterns—this in turn can be used to make optical character recognition, handwriting recognition and manual transcription much more efficient. Atomising, validating and resolving label data will create structured label data that can be more easily stored, searched and linked to other datasets.
We present a landscape analysis on the approaches, summarising previous work, and outline our plan to build future tools and systems in the SYNTHESYS+ Project as part of the Specimen Data Refinery. This will cover the sharing of tools, reducing barriers to access, integrating workflow engines into a software architecture that allows the components to be re-used and re-purposed with provenance data for repeatability, and conforms with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles (Wilkinson et al. 2016).
Files
BISS_article_37647.pdf
Files
(80.9 kB)
Name | Size | Download all |
---|---|---|
md5:5821e46783b5a89986036589c3f3d919
|
63.5 kB | Preview Download |
md5:8604c71e710d4c332bcc537e8cb0e4eb
|
17.3 kB | Preview Download |