Published September 8, 2023 | Version v2
Poster Open

Making Data Management feel easy: Integration of a Hyrax Data Repository into the Research Process

  • 1. IT.SERVICES, Ruhr University Bochum, Germany
  • 2. Biopsychology, SFB 1280 "Extinction Learning", Ruhr University Bochum, Germany
  • 3. Cognitive Psychology, Ruhr University Bochum, Germany
  • 4. University Library, Ruhr University Bochum, Germany

Description

Efforts have been underway for several years at the national and international level to build modern data infrastructures and services for research and science, data-driven innovation, foster data literacy development and promote sustainable data culture. To best support excellent as well as data-driven research and open science, a scientific institution must create structures and workflows that enable its scientists to manage their data. Among others this includes tasks of storing research data securely and making them accessible and reusable. More importantly, data management efforts involve a major shift in research culture for individuals, but ideally must be integrated into workflows, tools, and services to be successful.

Making science reproducible and sustainable requires a technical infrastructure that integrates the implementation of good scientific practice as early as possible in the research data lifecycle. As research methods evolve, new equipment is used, and innovative evaluation methods are established, research data management (RDM) infrastructures and their interfaces to re-search must also evolve continuously. However, further development of the infrastructures is not synonymous with their acceptance and use by researchers. The more RDM infrastructures are tailored to discipline-specific needs, the more efficiently they can be used by researchers. At the same time, however, the necessary resource requirements increase considerably due to the necessary individual adaptations. In this area of tension between generic and discipline-specific services, the central RDM service providers must prioritize the further development of their RDM infrastructures. In doing so, it is important to always remain in dialogue with researchers and thus create acceptance and trust in the RDM infrastructures from the very beginning.

In its RDM guidelines, Ruhr University Bochum (RUB) committed in 2018 to provide and maintain an appropriate infrastructure for research data management and thus ensure that digital research data are adequately stored and reliably accessible. IT and university library services sponsored by RUB were to develop such sustainable and coordinated RDM infrastructure and services along with use cases spinning the broad research disciplines and cultures on campus. In a first step an evaluation by an external consultancy (Antleaf Ltd.) confirmed that there is currently no software on the market that already covers all system requirements of the most complex and comprehensive use case with focus on neurosciences. The work revealed that although some open-source solutions already exist, adaptations in infrastructure, metadata and workflows are necessary for all systems. In conclusion, requirements and a specification for a system were created by the consultancy. In a second step, a service provider was contracted through an internationally negotiated bidding process to design, implement, and support an RDM system for the institution in close co-operation with the central research data service (RDS) of the RUB and a selected interdisciplinary use case with the focus neuroscience. The negotiated tender included additional requirements for the software platform of choice to be mature, globally deployed, and have broad and active community support including service providers, a wide range of functionality, and high flexibility.

Ultimately, the Hyrax open-source repository platform [1] of the Samvera community [2] was chosen that has already been used extensively in the RDM context [3]. Additional implementations into this platform were co-developed with researchers from the interdisciplinary re-search cluster use case with neuroscience focus to support their research workflow. The implementation of complex data models, organizational structure, and requirements by the RDM policy allow low barrier adoption for the researchers in the lab where data is being generated, early data sharing and reuse with collaborating research groups, data curation, preservation for 10 years and publication with DOI workflows. The latter are flanked by multi-step, and tiered review workflow. The system interfaces with native S3 compliant storage to store data and metadata. Easy authentication of local researchers is mediated via the common central identity management of the institution using Shibboleth, and for global external collaborators the authentication via ORCID.

The system developed, further supports FAIR data principles [4] and promotes the implementation of RDM and open science policy requirements. The underlying software platform of the system is flexible to implement discipline-specific requirements such as research workflows, data models, and community-driven metadata schemas to sustain excellent research projects for researchers. In addition, the system provides a fine-grained role and rights management allowing to map the complex structure of collaborative research clusters handling sensitive data (pseudonymized) in the research process and data at file level. The system also supports IT, library and RDM services provisioning RDM resources, curation, review and publication of data, and consultancy with the system meeting large RDM requirements from the beginning. The system is going to be released into production as an institutional service by early 2024. For the general use case of the broad institution requirements a data model with the DataCite metadata was implemented. In the future the system is awaiting new developments and new use cases e.g., from plasma physics and excellence initiatives.

 

References:

1. Hyrax: a community-supported repository frontend. https://hyrax.samvera.org/ (accessed 2023-07-06)
2. Samvera – a vibrant and welcoming community developing repository software tools. https://samvera.org/ (accessed 2023-07-06)
3. M. Tanifuji, A. Matsuda, H. Yoshikawa, “Materials Data Platform - a FAIR System for Data-Driven Materials Science”, Proceedings of the 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), 2019, doi: https://doi.org/10.1109/iiai-aai.2019.00206
4. M. D. Wilkinson, M. Dumontier, I.J. Aalbersberg, et al. „The FAIR Guiding Principles for scientific data management and stewardship”, Sci Data, 19, 6, March 2016, doi: https://doi.org/10.1038/sdata.2016.18

Notes

TO, NOCW, and MP are supported by the SFB 1280 INF project funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 316803389 – SFB 1280.

Files

MakingDataManagementFeelEasy_FrenzelJ_et_al_CoRDI_2023_v2.0.pdf

Files (4.1 MB)