Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use

Rauber, Andreas; Asmi, Ari; van Uytvanck, Dieter; Pröll, Stefan

doi:10.5281/zenodo.4048304

Published May 1, 2016 | Version v1

Journal article Open

Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use

1. TU Wien
2. University of Helsinki
3. CLARIN ERIC

Research data is changing over time as new records are added, errors are corrected and obsolete records are deleted from a data set. Researchers rarely use an entire data set or stream data as it is, but rather create specific subsets tailored to their experiments. In order to keep such experiments reproducible and to share and cite the particular data used in a study, researchers need means of identifying the exact version of a subset as it was used during a specific execution of a workflow, even if the data source is continuously evolving. In this paper we present 14 recommendations on how to adapt a data source for providing identifiable subsets for the long term, elaborated by the RDA Working Group on Dynamic Data Citation (WGDC). The proposed solution is based upon versioned data, timestamping and a query based subsetting mechanism. We provide a detailed discussion of the recommendations, the rationale behind them, and give examples of how to implement them.

Files

Rauber_et_al_TCDLBul_volume12_issue1-May2016_RDA-Guidelines.pdf

Files (299.7 kB)

Name	Size	Download all
Rauber_et_al_TCDLBul_volume12_issue1-May2016_RDA-Guidelines.pdf md5:c91271685623d761803a7421626bd091	299.7 kB	Preview Download

Views

532

Downloads

Show more details

	All versions	This version
Views	1,085	1,082
Downloads	532	531
Data volume	184.9 MB	184.6 MB

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

Zenodo

Published in

Bulletin of the IEEE Technical Committe on Digital Libraries, 12(1), 2016.

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: September 24, 2020
Modified: July 19, 2024

Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use

Authors/Creators

Description

Files

Rauber_et_al_TCDLBul_volume12_issue1-May2016_RDA-Guidelines.pdf

Files (299.7 kB)