FAIR Data Publishing with Apache Maven
Description
This is the slide deck for the paper "FAIR Data Publishing with Apache Maven" presented at the Damalos Workshop @ Extended Semantic Web Conference (ESWC) 2024.
Design and management of a large number of data processing pipelines is a challenging task. Analogous to DevOps, the term DataOps was coined to capture all the practices, processes and technologies related to the management of the life cycle of data artifacts, including the tracking of provenance. The solution space has been constantly increasing with novel approaches and tools becoming available, however with – for instance – more than 100 workflow engines available it is by far no longer feasible to assess them all. Semantic Web technology features many aspects relevant to DataOps, such as interlinkability of resources, DCAT for building decentral data catalogs, PROV-O for provenance descriptions, VoID for describing statistics about the used classes and properties. Yet, there are only few approaches that establish a coherent and holistic connection between these elements. In this work, we perform an in-depth analysis of the Apache Maven build system and its surrounding ecosystem for how they can be leveraged for automated data processing, publishing and RDF metadata generation with provenance tracking. We present three novel Maven plugins for SPARQL and RML execution, the creation of an RDF database file, and uploading artifacts to a CKAN instance. Finally, we present a prototype architecture where a Maven deployment of a geographic RDF dataset results in the automated generation of DCAT, PROV-O and VoID metadata such that datasets can be browsed on a map and filtered e.g. by the used classes and properties. All our resources are freely available as Open Source.
Files
Files
(5.2 MB)
Name | Size | Download all |
---|---|---|
md5:d1e1ce61aecb26ca83e2ffffa3633f1a
|
5.2 MB | Download |
Additional details
Identifiers
- URN
- urn:mvn:org.aksw.conf.eswc._2024:fair-data-publishing-with-apache-maven:1.20240613.0:pptx
Software
- Repository URL
- https://scaseco.github.io/maven4data/