HealthData@EU pilot project - Milestone 6.1 Report on the landscape analysis of available metadata catalogues and the metadata standards in use
Creators
Contributors
Description
Multiple studies have shown that one of the main barriers to the re-use of health data is the lack of findability. The absence of an inventory listing datasets available for secondary use prevents researchers and policymakers from accessing valuable health data. A metadata catalogue serves as such an inventory, consisting of metadata records—one per dataset—describing the dataset without granting direct access to the data itself.
Recognizing this challenge,the European Health Data Space (EHDS) regulation, emphasizes the need for a common metadata catalogue at the European level. This catalogue would facilitate the discovery and secondary use of health-related data across Europe. As part of the HealthData@EU pilot project, WP6 was responsible for developing a standardized metadata template and an online tool to support the creation of local, national, and European metadata catalogues, ensuring seamless interconnectivity. To design this standardized template, an initial step involved analyzing existing metadata standards and catalogues.
A study conducted by the Joint Action TEHDAS revealed the widespread use of various metadata standards, including:
-
MIABIS (Minimal Information About Biobank Data Sharing)
-
DCAT-AP (Data Catalogue Vocabulary - Application Profile)
-
CESSDA CMM (Consortium of European Social Science Data Archives - Core Metadata Model)
-
DDI (Data Documentation Initiative)
Among these, DCAT-AP is the standard used by data.europa.eu, which aggregates metadata from multiple sectors, including health, environment, education, justice, and transport. To ensure compatibility and enable harvesting of national health-related metadata catalogues by data.europa.eu, the DCAT-AP standard was chosen as the baseline for HealthData@EU.
Need for a Health-Specific DCAT-AP Extension
While DCAT-AP is widely used, it does not fully address the specific needs of health data. Existing metadata records describing health-related datasets lack granularity and essential properties, limiting findability and reusability.
Key missing elements include:
-
Granularity of data (aggregated vs. individual-level data)
-
Health topic classification (e.g., disease type)
-
Demographic attributes (age range, sex)
-
Data quality indicators (e.g., adherence to semantic standards)
To address these gaps, WP6 of the HealthData@EU pilot project has designed HealthDCAT-AP, an extension of DCAT-AP tailored for health-related datasets.
Methodology and Approach
To support the development of HealthDCAT-AP, WP6 launched a sandbox environment in January 2023. This online tool, based on FAIR Data Point technology, allowed metadata collection and validation through:
-
A web-based interface for health data providers to create metadata records.
-
A back-end server supporting metadata integration and interconnectivity.
Additionally, WP6 collected metadata information through three structured forms distributed to:
-
HealthData@EU consortium nodes
-
Use case leaders, who then shared them with their respective networks
The collected data was analyzed to assess catalogue structure, metadata properties, and health domain coverage.
Conclusion
The landscape analysis revealed significant gaps in existing metadata catalogues regarding health-related datasets. To enhance findability, interoperability, and reusability, WP6 has designed HealthDCAT-AP as a specialized solution for the EHDS framework.
This report provides a foundation for further development and implementation of metadata standards in health data governance, ensuring seamless data discovery across European platforms.
Files
HealthData@EU-Pilot_MS6.1_FIN.pdf
Files
(970.0 kB)
Name | Size | Download all |
---|---|---|
md5:0aa396ced48ed8a30006fa30dd397f3b
|
970.0 kB | Preview Download |
Additional details
Dates
- Accepted
-
2023-03-30