Report on New Methods for Data Quality Assurance, Verification and Enrichment

doi:10.5281/zenodo.3364509

Published January 31, 2019 | Version v1

Project deliverable Open

Report on New Methods for Data Quality Assurance, Verification and Enrichment

1. Royal Botanic Gardens, Kew, United Kingdom
2. Meise Botanic Garden, Meise, Belgium
3. Picturae BV, Heiloo, Netherlands

Distributed Systems of Scientific Collections (DiSSCo) will facilitate the production of tens of millions
of natural history specimen collection images along with their labels each year. The labels of these
specimens contain valuable information for research studies, but their transcription can be very
difficult and time consuming with often hard to read handwritten labels. Whilst accurate label
transcription is only one step along the way to create a specimen record fit for different research uses,
it is an extremely important one. It would be very time consuming to have to return to recheck label
information for even a very small proportion of specimens. Once a specimen is transcribed correctly
it becomes much easier to enhance the record with additional information from other sources, e.g.
from literature or collector itineraries, determine the point of collection from the textual information
on the label by a process known as georeferencing, or even to find inaccuracies within the label itself.
This document discusses and compares different approaches for the efficient accurate transcription
of these labels. Using Herbarium specimens as an example, the quality of transcribed data by in-house
trained institute staff, outsourced to a commercial company or transcribed by the general public
through online crowdsourcing platforms was compared. Key transcription data was assessed and
common errors in label transcription identified. Reasons for these errors are discussed along with
possible mechanisms to improve the accuracy of the transcriptions. The need for standards for
transcription was identified and recommendations made.

Files

Deliverable D4.2 ICEDIG - Data quality in transcription.pdf

Files (1.9 MB)

Name	Size	Download all
Deliverable D4.2 ICEDIG - Data quality in transcription.pdf md5:6719c5d4add15e51ef2b8f0c0caeee6c	1.9 MB	Preview Download

Additional details

ICEDIG – Innovation and consolidation for large scale digitisation of natural heritage 777483: European Commission

	All versions	This version
Views	264	262
Downloads	185	185
Data volume	391.5 MB	391.5 MB

Report on New Methods for Data Quality Assurance, Verification and Enrichment

Creators

Description

Files

Deliverable D4.2 ICEDIG - Data quality in transcription.pdf

Files (1.9 MB)

Additional details

Funding