Coscine and Data Stewardship
Authors/Creators
- 1. IT Center RWTH Aachen University
Description
Coscine is the research data management platform for research projects, which is developed by the RWTH Aachen's IT Center. It aims to assist researchers in making their data FAIR—Findable, Accessible, Interoperable, and Resusable—from file storage, description of those files with metadata, and collaboration with all participating researchers as well as safely archiving data for ten years in line with good scientific practice.
In this session, we will use three flash talks to present on how data stewards support researchers in using Coscine in their day-to-day work. We will start with the basics: setting up custom application profiles that will be used to annotate data with detailed and semantic metadata. Then, we will show how the Python SDK, which employs Coscine's REST-API to not only just upload files, but also to extract and annotate them with metadata in an automated fashion. Lastly, we will outline efforts to connect electronic lab notebooks (ELNs) to Coscine, both as a file storage but also as an archiving solution.
Equipping Researchers with the Power of Metadata
In my role as a data steward, I help researchers set up custom application profiles to annotate data with detailed metadata, ensuring the findability and reproducibility of data according to the FAIR principles. Subject-specific combinations of metadata, known as metadata schemata, have become established in various disciplines. However, due to the strict rules often associated with these schemas, Coscine has introduced the term metadata profile or application profile. All files within a resource in Coscine should be accompanied by metadata.
In some cases, users can select from existing application profiles, but if none are suitable, that’s where I step in. I work with researchers to discuss the necessary metadata fields, striving to use terms from established dictionaries or ontologies. For creating application profiles, we use the AIMS Generator tool, which not only helps create custom profiles easily but also checks them for syntactical correctness. After a researcher submits an application profile, my role is to review it for correctness, ensuring the proper use of ontology terms. Any changes or suggestions regarding the application profile are discussed with the user in GitLab.
Automating Metadata to Coscine
As a data steward part of my work is to create seamless workflows that allow the easy, efficient use of tools such as Coscine to improve the management of research data. As shown by my colleagues, Coscine ensures that data meets the FAIR principles via the requirement of metadata. The metadata provides context to the data stored in Coscine so that it is reuseable to other researchers in the scientific community. However, the number of metadata fields within the application profile, that are necessary to provide that context, can be quite extensive. It is at that point where I begin my work as the data steward. In order for researchers to feel confident in their decision to use Coscine, we must make it more effortless to get started and decrease the time that is needed by the researchers to upload their data and metadata. I will present how the task of providing this metadata can be made simple by using the Coscine SDK and Python to write scripts which extract necessary metadata from research data files and insert that metadata into the application profile, or metadata form in the Coscine resource.
Connecting Platforms: eLabFTW and Coscine
Coscine takes care of (meta)data storage, while electronic lab notebooks assist researchers in digitally documenting their work. One of the prime advantages of an ELN vs. an analog lab journal is the ability to link items, including the data belonging to an experiment. However, the database structure behind an ELN does not necessarily support (a) large amounts of data, (b) well-structured data deposition, or (c) easy archival using an institutional infrastructure solution.
Both the ELN eLabFTW and Coscine offer REST APIs. As reported at the DSgG 2022 [1], these can be used to build automated workflows which link not only the two platforms, but also measurement devices. However, such workflows have proven difficult to maintain and a native solution both on the ELN side as well as Coscine is much desired. Here, we will report on how RDM staff and researchers came together to propose a solution to the developers on both ends. Depending on the current state, we will also report the results of this effort.
References
[1] Parks, N. A. (2024). Improving Data-Producing Workflows in SFB 985. Data Stewardship goes Germany, Braunschweig. Zenodo. https://zenodo.org/badge/DOI/10.5281/zenodo.11066590.svg, https://doi.org/10.5281/zenodo.11066590
Files
datastwardshipCosine_dsgg2024.pdf
Files
(6.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8c3dd17b0299c6d12b29df3ca265b571
|
6.8 MB | Preview Download |