The Role of DDI-CDI in EOSC: Possible Uses and Applications
Description
This report looks at the potential use of the Data Documentation Initiative Cross-Domain Integration (DDI-CDI) specification to the data-sharing requirements faced by EOSC. By analyzing real-world projects and implementations, and through discussion with those responsible for related metadata and infrastructure specifications, the potential role played by the DDI-CDI model in the overall EOSC system is envisioned, and recommendations made for how to realize the identified opportunities for its use.
The challenges faced by EOSC can be broken into two main areas:
- Problems of Scale: The volume of data is growing exponentially and is coming from a wider range of sources. At the same time, the FAIR principles require an increased amount of metadata, especially when it comes to interoperability and reuse of that data. Current manual approaches are proving to be unsustainable. The automation of metadata collection - that is, harvesting metadata programmatically from systems which produce, manage, disseminate, and use data - offers a possible solution, but the necessary framework for such activities is not in place. Standard models and encoding for such metadata (a "lingua franca") must be established for large-scale capture and exchange of metadata.
- Problems of Cross-Domain Use: In order for data to be shared across domain and institutional boundaries, it must be understood by its users at all levels. While increasing attention is paid to the semantic mapping of concepts across domains, there are other critical needs for such data sharing. Disparate data structures must be accommodated, based on the tools and formats used in specific domains, and the means of data collection and processing - the provenance of the data - must be understood. Use of specific domain models and vocabularies must be known, and they must be accessible in a machine-actionable form. Reusable crosswalks between domains are needed. All of these requirements point to the need for more granular metadata, so that data can be successfully re-arranged to be suitable for use outside its domain of origin. The path of a single observation, as it is reused and further processed, should be knowable.
DDI-CDI will not address all of these concerns; no single standard or technology will provide a complete answer. It has, however, been designed to fill important gaps in the needed range of standards, models, and technologies to meet these challenges. On the basis of an intensive series, of meetings, conference sessions, workshops and other discussions with a range of different groups, this report looks at use cases and the emerging FAIR ecosystem to understand the potential application of DDI-CDI, and the role it could play within a broader frame. The approach being taken by EOSC—as described in the EOSC Interoperability Framework and in other activities—is then assessed to show specifically where DDI-CDI would fit. Recommendations for further work are then made on that basis.
Specific implementation examples include a data integration using climate data, energy consumption data, and consumer questionnaire responses; an example of how a repository could facilitate automated capture of metadata, based on the Dataverse platform; a data integration example from the European Social Survey Multi-Level application; and an exploration of processing, provenance, and cross-domain requirements as seen in the ALPHA Network and INSPIRE applications for the integration of population and clinical data. An analysis of how DDI-CDI could be used in combination with DCAT is presented, and the role which DDI-CDI could play within the emerging FAIR ecosystem, in relation to FAIR Implementation Profiles, FAIR Data Points, and FAIR Digital Objects, is examined. Finally, the way in which DDI-CDI could be integrated into the emerging EOSC infrastructure is considered in light of the EOSC Interoperability Framework and the FAIRsFAIR vision of integrated metadata catalogues.
DDI-CDI offers a new type of specification which could help to realize the capture, interchange, and use of metadata throughout the EOSC data-sharing infrastructure, and could do so in ways which are scalable and machine-actionable. It operates at the needed level of granularity and would work to heighten the utility of semantic mapping and approaches to the full utilization of data. Our recommendations identify several concrete areas where this application of the model should be further explored.
This work was supported by the EOSC Secretariat.
EOSCsecretariat.eu has received funding from the European Union's Horizon Programme call H2020-INFRAEOSC-2018-4, Grant Agreement number 831644.
Files
EOSC_DDI-CDI_Project_Intro_UseCases_Recommendations_1_0_FINAL.pdf
Files
(4.7 MB)
Name | Size | Download all |
---|---|---|
md5:047ae54107bf4576535f6bfbb366b10a
|
4.7 MB | Preview Download |