Published September 27, 2023 | Version v1
Presentation Open

Creating Rich Metadata for Collaborative Research: Case Studies and Challenges

Description

This presentation, which was held at the NFDI4Ing Conference 2023, showcases workflow elements for the creation of rich metadata in collaborative research. These elements are part of a larger framework that is under active development and aims to provide practical guidance to researchers, data managers, and the research data management (RDM) community, on effectively managing metadata along the data lifecycle. The framework is motivated and validated by two applied projects: DeeperSense and RoBivaL, which have been conducted by the DFKI Robotics Innovation Center in coordination with the NFDI4Ing Task Area Golo.

 

The first part of the presentation focuses on these two case studies. Due to their different research goals and methods, the two projects have different data and metadata profiles, making them suitable test cases for a framework that shall be general enough to support a broad range of applications. The presentation addresses selected data management tasks in different data lifecycle phases. It argues that data managers who strive to produce FAIR metadata are intermediaries between their research team and potential data reusers, two groups with very different requirements and practices. It further makes the case that the consumer-facing part of FAIR metadata is very small compared to the invisible "precursor" metadata which needs to be generated and managed in order to reach a certain quality level.

 

The second part of the presentation proposes a set of workflow-oriented metadata catagories that emerged from data management practice. These categories are orthogonal to established categories that focus on purpose or semantics, such as structural, descriptive, or administrative metadata.

 

The proposed categories are organized along three dimensions:

 

Firstly, a workflow-oriented view on metadata must recognize that every data instance is the output of some procedure. Both procedure and output are sources of metadata. High-level examples of procedures are the stages of the data lifecyle, i.e. planning, collection, analysis, etc. Each stage comprises additional mid- and low-level procedures which create intermediate outputs.

 

The relevance of analyzing procedures for metadata creation is well-established in the RDM community, evident e.g. by the "processing step" class, which is at the core of the Metadata4Ing (M4I) ontology. While the M4I ontology may be a suitable tool for communicating metadata to data consumers, this presentation argues that the production of (precursor) metadata in research teams requires additional tools.

 

Secondly, before data is created or transformed, both the procedure and its output are usually planned and specified, resulting in metadata that is not extracted from the data but injected into it. Examples for injected metadata are the address of a software repository (specifying a procedure), or a database schema (specifying an output). Extracted metadata are e.g. performance metrics (of a procedure) or a file checksum (of an output).

 

Thirdly, metadata itself is also data, as noted by virtually every metadata definition. Consequently, yet rarely recognized, metadata creation is recursive. The presentation gives examples of higher-order metadata ("meta-metadata") on multiple levels of recursion. It then discusses the different purposes of higher-order metadata for research teams, data reusers, and the RDM community.

 

The presentation concludes with a question. Regarding the multitude of challenges and requirements, the different demands of research teams and data reusers, the amount of precursor metadata necessary for high-quality consumer metadata: How do we avoid "scope explosion"? How is (meta)data management different from general knowledge management? These questions shall be addressed in future research.

Notes

The author would like to thank the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.

 

This work was supported by the project RoBivaL, funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWE) under grant number 50RP2150.

 

This work was supported by the project DeeperSense that received funding from the European Commission. Program H2020-ICT-2020-2 ICT-47-2020 Project Number: 101016958.

 

The responsibility for the content of this presentation lies with the author.

Files

backe-2023-creating_rich_metadata_for_collaborative_research.pdf

Files (5.0 MB)

Additional details

Related works

Is supplemented by
Dataset: 10.5281/zenodo.7728089 (DOI)
Conference paper: 10.1109/OCEANS47191.2022.9977024 (DOI)
Dataset: 10.5281/zenodo.8424933 (DOI)

Funding

European Commission
DeeperSense - Deep-Learning for Multimodal Sensor Fusion 101016958