Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published July 31, 2019 | Version v1
Project deliverable Open

Updating data standards in transcription

  • 1. Agentschap Plantentuin Meise
  • 2. Natural History Museum London
  • 3. Royal Botanic Gardens, Kew
  • 4. Naturalis Biodiversity Center
  • 5. Finnish Museum of Natural History

Description

There are more than 1.2 billion biological specimens in the world’s museums and herbaria. These objects are a particularly important form of biological sample and observation. They underpin biological taxonomy, but the data they contain have many other uses in the biological and environmental sciences. Nevertheless, from their conception they are almost entirely documented on paper, either as labels attached to the specimens or in catalogues linked with catalogue numbers. In order to make the best use of these data and to improve the findability of these specimens, these data must be transcribed digitally and made to conform to standards, so that these data are also interoperable and reusable. Through various digitization projects the authors have experimented with transcription, by volunteers, expert technicians, scientists, commercial transcription services and automated systems. We have also been consumers of specimen data for taxonomical, biogeographical and ecological research. In this paper we draw from our experiences to make specific recommendations to improve transcription data. The paper is split into two sections. We first address issues related to database implementation with relevance to data transcription, namely versioning, annotation, unknown and incomplete data and issues related to language. We then focus on particular data types that are relevant to biological collection specimens, namely nomenclature, dates, geography, collector numbers and uniquely identifying people. We make recommendations to standards organizations, software developers, data scientists and transcribers to improve these data with the specific aim of improving interoperability between collection datasets.

Files

Deliverable D4.3 - ICEDIG_Updating data standards in transcription.pdf

Files (611.1 kB)

Additional details

Funding

ICEDIG – Innovation and consolidation for large scale digitisation of natural heritage 777483
European Commission