Presentation of paper "Prepare to Preserve: the Harvest Combine for Research Data at Stockholm University and the Systems Inventory"
Description
ANATOMICALTHEATRE_20240917_1400_PHILIPSON_JOAKIM_V1
Presentation (with demo) of a short paper on the so-called Harvest Combine developed at Stockholm University, at iPRES2024, September 17. Abstract of paper follows:
In the absence yet of a digital archive compliant with the OAIS model requirements Stockholm University (SU) is nevertheless continually developing a "Harvest Combine" tool for harvesting and transformation of metadata and data files from data repositories that are used by SU researchers, such as Figshare, Datadryad and Zenodo. The metadata are collected from these repositories, enriched with metadata from other sources, and then transformed to accord with the Swedish National Archives recent implementation (aka FGS 2.0) of the European Common Specification for Information Packages (E-ARK CSIP) and Specification for Submission Information Packages (E-ARK SIP) versions 2.1.0. The metadata records are then stored in the SU local (temporal) archive together with the associated data files harvested simultaneously. Part of the motivation for this preparatory digital preservation work are the perceived risks of trusting the digital preservation of research data files produced by SU researchers to external repositories, that we do not fully control locally. The end product of this harvest and transformation processing are SIPs (Submission Information Packages), still awaiting future transformation to AIPs (Archival Information Packages) and eventually DIPs (Dissemination Information Packages). The associated data files are not transformed or converted in this first step towards long-term preservation and archiving, but we keep track of the file formats ingested partly by mapping file extensions to mime types in a special registry xml-file, used in the transformation processing and continuously updated whenever "new" file formats are encountered for the first time.
The software scripts that we developed for this "harvest combine" are in BASH Unix Shell, XQuery and XSLT. The processing of metadata, data and scripts occurs locally using Git Bash (for Windows), BaseX and Oxygen XML editor, but is essentially software-tool agnostic. Metadata input sources are e.g. OAI-PMH feeds (DataCite or METS), repository specific APIs (to get necessary file metadata) or generic command line scripts (e.g. for checksums).
Only recently we have also created a cross-sectional Digital Preservation Group at Stockholm University Library, whose first task is to create an inventory of external and internal source information systems (e.g. data repositories), together with their local destination archival or storage systems. This inventory is aimed to serve as a basis for a digital preservation plan for future long-term preservation.
Keywords – harvesting, metadata, transformation, research data, repositories, long-term preservation
Files
dryad1origMD.JPG
Files
(83.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c363c0a2fe3363d07ebd358be28d3b88
|
152.9 kB | Preview Download |
|
md5:b25a839abd79d101b6df29e3e83d5445
|
117.8 kB | Preview Download |
|
md5:9efc5e722fcd6359c409fd3669676b34
|
134.0 kB | Preview Download |
|
md5:cde45ef6e796ede5d64fd0e98c7776af
|
15.7 MB | Preview Download |
|
md5:c7eb8680a08af03bff4b4e6432bddc86
|
22.1 MB | Preview Download |
|
md5:02eee56287e0834693325315023d5be3
|
169.2 kB | Preview Download |
|
md5:18de42bf34673b5eb868d2df827026c6
|
187.1 kB | Preview Download |
|
md5:f25e5e6e1aec40abb1ef81e6a40e488c
|
166.2 kB | Preview Download |
|
md5:48cdd5dbb23209e9e8ee3758aa9bd654
|
44.1 MB | Download |
|
md5:bfcc702ccd7a774d56d6d31d225a711e
|
133.9 kB | Preview Download |
|
md5:81e7758e54d7876162e09f16942c2a4d
|
132.8 kB | Preview Download |
|
md5:cc0b3a61d3fe8a86a32c629895d08647
|
142.5 kB | Preview Download |
|
md5:156b3862bc3831d7f5ec4cf4baffa5fd
|
121.8 kB | Preview Download |
Additional details
Dates
- Available
-
2024-09-11
Software
- Repository URL
- https://github.com/StockholmUniversityRDMteam/RDMtoolkit4suRe-use
- Programming language
- XSLT, XQuery
- Development Status
- Active