Journal article Open Access

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv

Farah Zaib Khan; Stian Soiland-Reyes; Richard O. Sinnott; Andrew Lonie; Carole Goble; Michael R. Crusoe

Other(s)
Peter Amstutz; Pau Ruiz Safont; Pjotr Prins; Brad Chapman; Christopher Ball; Lon Blauvelt; Tomoya Tanjo; Alban Gaignard

Background

The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.

Results

Based on best-practice recommendations identified from the literature on workflow design, sharing, and publishing, we define a hierarchical provenance framework to achieve uniformity in provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realize this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We use open source community-driven standards, interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric research objects generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups.

Conclusions

The underlying principles of the standards utilized by CWLProv enable semantically rich and executable research objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, reuse the methods for partial reruns, or reproduce the analysis to validate the published findings.

Published in GigaScience Volume 8, Issue 11, November 2019, giz095. Cite as: https://doi.org/10.1093/gigascience/giz095
Files (4.4 MB)
Name Size
CWLProv.pdf
md5:c97cb2595dde4252e12e58a4e50089ed
4.2 MB Download
Response_to_editor_2.pdf
md5:30bebb5f74d5347511275989bbc17823
62.6 kB Download
Response_to_Reviewers.pdf
md5:aaf4529963efcd0c554bc35c541ce024
135.6 kB Download
2,622
1,570
views
downloads
All versions This version
Views 2,622145
Downloads 1,57099
Data volume 6.1 GB262.1 MB
Unique views 2,084127
Unique downloads 1,15267

Share

Cite as