Published September 10, 2024 | Version 1.0
Presentation Open

Presentation of paper "Prepare to Preserve: the Harvest Combine for Research Data at Stockholm University and the Systems Inventory"

  • 1. ROR icon Stockholm University

Description

ANATOMICALTHEATRE_20240917_1400_PHILIPSON_JOAKIM_V1

Presentation (with demo) of a short paper on the so-called Harvest Combine developed at Stockholm University, at iPRES2024, September 17. Abstract of paper follows:

In the absence yet of a digital archive compliant with the OAIS model requirements Stockholm University (SU) is nevertheless continually developing a "Harvest Combine" tool for harvesting and transformation of metadata and data files from data repositories that are used by SU researchers, such as Figshare, Datadryad and Zenodo. The metadata are collected from these repositories, enriched with metadata from other sources, and then transformed to accord with the Swedish National Archives recent implementation (aka FGS 2.0) of the European Common Specification for Information Packages (E-ARK CSIP) and Specification for Submission Information Packages (E-ARK SIP) versions 2.1.0. The metadata records are then stored in the SU local (temporal) archive together with the associated data files harvested simultaneously. Part of the motivation for this preparatory digital preservation work are the perceived risks of trusting the digital preservation of research data files produced by SU researchers to external repositories, that we do not fully control locally. The end product of this harvest and transformation processing are SIPs (Submission Information Packages), still awaiting future transformation to AIPs (Archival Information Packages) and eventually DIPs (Dissemination Information Packages). The associated data files are not transformed or converted in this first step towards long-term preservation and archiving, but we keep track of the file formats ingested partly by mapping file extensions to mime types in a special registry xml-file, used in the transformation processing and continuously updated whenever "new" file formats are encountered for the first time.

The software scripts that we developed for this "harvest combine" are in BASH Unix Shell, XQuery and XSLT. The processing of metadata, data and scripts occurs locally using Git Bash (for Windows), BaseX and Oxygen XML editor, but is essentially software-tool agnostic. Metadata input sources are e.g. OAI-PMH feeds (DataCite or METS), repository specific APIs (to get necessary file metadata) or generic command line scripts (e.g. for checksums).
Only recently we have also created a cross-sectional Digital Preservation Group at Stockholm University Library, whose first task is to create an inventory of external and internal source information systems (e.g. data repositories), together with their local destination archival or storage systems. This inventory is aimed to serve as a basis for a digital preservation plan for future long-term preservation.

Keywords – harvesting, metadata, transformation, research data, repositories, long-term preservation

Files

dryad1origMD.JPG

Files (83.3 MB)

Name Size Download all
md5:c363c0a2fe3363d07ebd358be28d3b88
152.9 kB Preview Download
md5:b25a839abd79d101b6df29e3e83d5445
117.8 kB Preview Download
md5:9efc5e722fcd6359c409fd3669676b34
134.0 kB Preview Download
md5:cde45ef6e796ede5d64fd0e98c7776af
15.7 MB Preview Download
md5:c7eb8680a08af03bff4b4e6432bddc86
22.1 MB Preview Download
md5:02eee56287e0834693325315023d5be3
169.2 kB Preview Download
md5:18de42bf34673b5eb868d2df827026c6
187.1 kB Preview Download
md5:f25e5e6e1aec40abb1ef81e6a40e488c
166.2 kB Preview Download
md5:48cdd5dbb23209e9e8ee3758aa9bd654
44.1 MB Download
md5:bfcc702ccd7a774d56d6d31d225a711e
133.9 kB Preview Download
md5:81e7758e54d7876162e09f16942c2a4d
132.8 kB Preview Download
md5:cc0b3a61d3fe8a86a32c629895d08647
142.5 kB Preview Download
md5:156b3862bc3831d7f5ec4cf4baffa5fd
121.8 kB Preview Download

Additional details

Dates

Available
2024-09-11

Software

Repository URL
https://github.com/StockholmUniversityRDMteam/RDMtoolkit4suRe-use
Programming language
XSLT, XQuery
Development Status
Active