Creating Rich Metadata for Collaborative Research: Case Studies and Challenges
Authors/Creators
- 1. DFKI RIC
- 2. LUH IPeG
- 3. TUD
Contributors
Project leader (2):
Project member (6):
- 1. DFKI RIC
Description
This presentation, which was held at the NFDI4Ing Conference 2023, showcases workflow elements for the creation of rich metadata in collaborative research. These elements are part of a larger framework that is under active development and aims to provide practical guidance to researchers, data managers, and the research data management (RDM) community, on effectively managing metadata along the data lifecycle. The framework is motivated and validated by two applied projects: DeeperSense and RoBivaL, which have been conducted by the DFKI Robotics Innovation Center in coordination with the NFDI4Ing Task Area Golo.
The first part of the presentation focuses on these two case studies. Due to their different research goals and methods, the two projects have different data and metadata profiles, making them suitable test cases for a framework that shall be general enough to support a broad range of applications. The presentation addresses selected data management tasks in different data lifecycle phases. It argues that data managers who strive to produce FAIR metadata are intermediaries between their research team and potential data reusers, two groups with very different requirements and practices. It further makes the case that the consumer-facing part of FAIR metadata is very small compared to the invisible "precursor" metadata which needs to be generated and managed in order to reach a certain quality level.
The second part of the presentation proposes a set of workflow-oriented metadata catagories that emerged from data management practice. These categories are orthogonal to established categories that focus on purpose or semantics, such as structural, descriptive, or administrative metadata.
The proposed categories are organized along three dimensions:
Firstly, a workflow-oriented view on metadata must recognize that every data instance is the output of some procedure. Both procedure and output are sources of metadata. High-level examples of procedures are the stages of the data lifecyle, i.e. planning, collection, analysis, etc. Each stage comprises additional mid- and low-level procedures which create intermediate outputs.
The relevance of analyzing procedures for metadata creation is well-established in the RDM community, evident e.g. by the "processing step" class, which is at the core of the Metadata4Ing (M4I) ontology. While the M4I ontology may be a suitable tool for communicating metadata to data consumers, this presentation argues that the production of (precursor) metadata in research teams requires additional tools.
Secondly, before data is created or transformed, both the procedure and its output are usually planned and specified, resulting in metadata that is not extracted from the data but injected into it. Examples for injected metadata are the address of a software repository (specifying a procedure), or a database schema (specifying an output). Extracted metadata are e.g. performance metrics (of a procedure) or a file checksum (of an output).
Thirdly, metadata itself is also data, as noted by virtually every metadata definition. Consequently, yet rarely recognized, metadata creation is recursive. The presentation gives examples of higher-order metadata ("meta-metadata") on multiple levels of recursion. It then discusses the different purposes of higher-order metadata for research teams, data reusers, and the RDM community.
The presentation concludes with a question. Regarding the multitude of challenges and requirements, the different demands of research teams and data reusers, the amount of precursor metadata necessary for high-quality consumer metadata: How do we avoid "scope explosion"? How is (meta)data management different from general knowledge management? These questions shall be addressed in future research.
Notes
Files
backe-2023-creating_rich_metadata_for_collaborative_research.pdf
Files
(5.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b2958d4fbee5e528f2247e4cfd76fe12
|
5.0 MB | Preview Download |
Additional details
Related works
- Is supplemented by
- Dataset: 10.5281/zenodo.7728089 (DOI)
- Conference paper: 10.1109/OCEANS47191.2022.9977024 (DOI)
- Dataset: 10.5281/zenodo.8424933 (DOI)