EOSC-Life Common Provenance Model

Wittner, Rudolf; Mascia, Cecilia; Frexia, Francesca; Müller, Heimo; Geiger, Jörg; Exter, Katrina; Holub, Petr

The exchange of research data and physical specimens has become an issue of major importance for modern research. Many reports indicate problems with quality, trustworthiness and reproducibility of research results, mainly due to poor documentation of the data generation or the collection of specimens. The significant impact of flawed research results on health, economics and political decisions has frequently been stated. Consequently, professional societies and research initiatives call for improved and standardised documentation of the data and specimens used in research studies.

Provenance information documents the evolution of an object and can be used to assess its quality and reliability. This deliverable defines components of distributed provenance information to enable interlinking of provenance information generated in different organisations involved in the research process, such as biobanks, research centres, universities or analytical laboratories. The distributed provenance information model builds on an existing provenance information standard, W3C PROV, and follows a general provenance composition pattern. Both W3C PROV and provenance composition pattern is described in this document.

Since understanding of the term “provenance information” differs across different domains and research communities, this deliverable firstly harmonises this understanding by providing a general explanation of how provenance information is generated and used.

In particular, this deliverable defines a connector, that is a provenance component containing technical information to traverse through provenance information. The connector is subsequently added to provenance information generated by different organisations. This deliverable also defines how to interpret identifiers of provenance structures in a distributed environment and how to include and interpret persistent identifiers of documented objects.

This deliverable deals with the common provenance model developed as a part of a standardisation process in the International Organisation for Standardisation (ISO) technical committee “Biotechnology” ISO/TC 276, and which is registered as project ISO 23494 in the working group 5 “Data processing and Integration”. Because this work is copyrighted by ISO and cannot be published as a public deliverable, this text describes the essential design of the provenance model, and the actual ISO document is provided as a non-public supplement. This is in line with the work plan of EOSC-Life WP6 in order to support adoption of the standard both in academia and in industry. The Common Provenance Model has been accepted as a Preliminary Work Item under 23494 Part 2 and it is being proposed for moving it into the next phase, the New Work Item at the time of submitting the deliverable.

