Published August 4, 2025
| Version v1
Conference paper
Open
Implementing a hierarchical data model into a repository platform - A feasibility study
Authors/Creators
- 1. Ruhr University Bochum
- 2. Antleaf Ltd.
- 3. Cottage Labs LLP
Contributors
Editors:
- 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
- 2. University of Amsterdam
Description
In CRC 1280 "Extinction Learning", researchers from biology, psychology, medicine, and com-putational neuroscience generate data with a variety of experimental methods (e.g. electro-physiology, electroencephalography, microscopy, functional magnetic resonance imaging) which are carried out with multiple (human or animal) individuals. Typically, these experiments are conducted for several weeks or months before the resulted data generation is finished. The CRC 1280 developed a nested data model for these experiments, which shares features with the Brain Imaging Data Structure, developed at the same time, such as a hierarchical folder structure and inheritance strategy for the meta data assignment. Standardized vocabularies for SFB metadata fields were introduced, and a mapping to bibliographic standards Dublin Core and DataCite was implemented. The use of the common data model was enforced by a data management policy of the CRC and accompanied by consulting and training measures. When designing a technical support infrastructure for the CRC, we aimed at bridging the gap between data deposit on a network drive, where data can be stored instantaneously and is displayed in a familiar folder structure, and a repository architecture, which is storing data associated with a persistent identifier (PID) and in a flat hierarchy. For the CRC 1280, a lucid presentation of the hierarchical data structure and the upload of individual files within each sub-folder at any time before publication was required to encourage frequent (at best: daily) use of the repository. At the same time the software should enable internal data sharing, archiving data for 10 years and data publication within the same system. To make this possible, the Hyrax repository engine from the Samvera community was chosen because it provides a flexible toolbox enabling custom adaptations. The implementation of the hierarchical data model was supported by an external service provider and comprised a presentation of all sub-folders of the data structure together with its associated metadata, which are (partially) prepopulated by inheritance across folder structures. Furthermore, a faceted search within sub-folders and across digital objects was implemented including a one-click filtered download option for the search result. These custom adaptations are intended to make the complex hierarchical structures underlying the CRC experiments as fair digital objects easily navigable and working with the data an intuitive matter of a few clicks. The implementation of the data model into the repository platform could be finished in spring 2025 and currently a total of 18 TB of data comprising 3.9 million files in 40.800 folders is ingested into the platform. We will summarize success and pitfalls regarding the conceptual design, technical implementation and communication with the researchers. Finally, we will outline and discuss how to measure and evaluate the subsequent use of the platform and how to consolidate the future use of the system.
Files
CoRDI_2025_paper_97.pdf
Files
(119.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7d11834fd0bbaa412613888c8ef16cd3
|
119.6 kB | Preview Download |