CWLProv: Interoperable Retrospective Provenance Capture and Computational Analysis Sharing

doi:10.5281/zenodo.1473157

Published October 28, 2018 | Version v3

Working paper Open

CWLProv: Interoperable Retrospective Provenance Capture and Computational Analysis Sharing

1. The University of Melbourne; Common Workflow Language project
2. The University of Manchester; Common Workflow Language project
3. The University of Melbourne
4. The University of Manchester
5. Common Workflow Language project

Background: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still several challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.

Results: Based on best practice recommendations identified from literature on workflow sharing and publishing, we define four hierarchical levels of provenance that collectively result in comprehensive and fully re-executable workflows when used with domain-specific information. To realise these levels, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We utilise open source community-driven standards; interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric Research Objects (RO) generated along with the final outputs of a given workflow enactment. We illustrate this approach through a practical demonstration of CWLProv applied to real-life genomic workflows developed by independent groups.

Conclusions: Our approach to workflow sharing and publication mitigates workflow decay. The underlying principles of the standards utilised by CWLProv enable semantically-rich and executable Research Objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, re-use the methods for partial re-runs, or reproduce the analysis to validate the published findings.

Notes

In preparation for submission to GigaScience

Files

CWLProv.pdf

Files (6.1 MB)

Name	Size	Download all
CWLProv.pdf md5:edf7767414fad09d823cf6402f64c256	6.1 MB	Preview Download

Additional details

BioExcel – Centre of Excellence for Biomolecular Research 675728: European Commission

	All versions	This version
Views	3,623	485
Downloads	2,439	287
Data volume	10.6 GB	1.8 GB

CWLProv: Interoperable Retrospective Provenance Capture and Computational Analysis Sharing

Notes

Files

CWLProv.pdf

Files (6.1 MB)

Additional details

Related works

Funding

CWLProv: Interoperable Retrospective Provenance Capture and Computational Analysis Sharing

Creators

Description

Notes

Files

CWLProv.pdf

Files (6.1 MB)

Additional details

Related works

Funding