Poster Open Access
Submitting a dataset to a data repository is the process of transferring a data object from the private domain to the shared or public domain (see domain model by Treloar & Klump 2019). The data provider’s intention is to preserve (i.e., archive) the data and in most cases also to make it accessible (i.e., publish) to a broad audience. Data repositories receiving the data need to make a number of decisions on how to treat the submitted data to fulfill the expectations of the data provider. Thus, most repositories publish terms and conditions under which they operate. However, often these do not cover all aspects needed. Secondly, compliance with these terms and conditions needs to be verified. To our experience, this verification process is based on the individual expertise and the experience of the data curation personnel. In many cases there is no formal and transparent process in place.
The motivation of this work is to provide data curators of data repositories with a practical guide containing a catalogue of criteria to be verified at data submission time. The catalogue also specifies information requirements that need to be collected from data providers at submission time, because they may not be available in the future. Information, such as retention period of the data in the repository, or the responsibility for data disposal, are typically not part of standard metadata.
Commencing from an initial draft catalogue, designed by the authors for an institutional repository, a 1-day workshop was conducted with data management support staff, data managers and data curators in order to discuss, complement and re-structure the criteria. The resulting criteria were phrased as questions and grouped into seven categories: Appraisal, Compliance, Primary Data, Metadata, Preservation, Access and Curation. Each question was designated to either the data provider or the data manager / curator or both.
The catalogue is supposed to serve as basis for appraisal and assessment of data submitted to a repository. Applying the criteria leads to the decision whether it can be ingested as it is, ingested only with preceding curation or has to be rejected. A repository that implements the catalogue and publishes that the criteria will be applied on data submitted, increases its transparency as well as effectiveness and efficiency. The criteria catalogue can be customized by the repository in accordance to its needs by e.g., deleting, adding or weighting the criteria. In a next step, the catalogue could be developed further by e.g., adding implications of certain answers for the data provider and the repository, translating it into a decision tree or complementing it with a scoring system.
This work was conducted as part of the eeFDM project, which was funded by the German Federal Ministry of Education and Research (BMBF).