10.5281/zenodo.3528059
https://zenodo.org/records/3528059
oai:zenodo.org:3528059
Farah Zaib Khan
Farah Zaib Khan
0000-0002-6337-3037
The University of Melbourne; Common Workflow Language project
Stian Soiland-Reyes
Stian Soiland-Reyes
0000-0001-9842-9718
The University of Manchester; Common Workflow Language project
Richard O. Sinnott
Richard O. Sinnott
0000-0001-5998-222X
The University of Melbourne
Andrew Lonie
Andrew Lonie
0000-0002-2006-3856
The University of Melbourne
Carole Goble
Carole Goble
0000-0003-1219-2137
The University of Manchester
Michael R. Crusoe
Michael R. Crusoe
0000-0002-2961-9670
Common Workflow Language project
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv
Zenodo
2019
Provenance
Common Workflow Language
CWL
Research Object
RO
BagIt
Interoperability
Scientific Workflows
Containers
Peter Amstutz
Peter Amstutz
0000-0003-3566-7705
Curoverse; Common Workflow Language
Pau Ruiz Safont
Pau Ruiz Safont
0000-0002-1827-331X
EMBL-EBI
Pjotr Prins
Pjotr Prins
0000-0002-8021-9162
Brad Chapman
Brad Chapman
0000-0002-3026-1856
Harvard School of Public Health
Christopher Ball
Christopher Ball
0000-0003-3523-5312
RTI international
Lon Blauvelt
Lon Blauvelt
0000-0001-8352-873X
University of California, Santa Cruz
Tomoya Tanjo
Tomoya Tanjo
0000-0002-4421-9659
Alban Gaignard
Alban Gaignard
0000-0002-3597-8557
2019-11-01
eng
10.1109/BigData.2016.7840618
10.5281/zenodo.592090
10.5281/zenodo.51314
https://zenodo.org/record/1304969
10.17632/xnwncxpw42.1
10.17632/6wtpgr3kbj.1
10.17632/97hj93mkfd.3
10.5281/zenodo.1471376
10.5281/zenodo.1471585
10.5281/zenodo.1471589
https://zenodo.org/record/2841641
https://zenodo.org/record/2632836
https://zenodo.org/record/2838898
10.5524/100625
https://github.com/stain/cwlprov-paper-gigascience
10.1093/gigascience/giz095
10.5281/zenodo.1208477
https://zenodo.org/communities/ro
https://zenodo.org/communities/linkeddata
https://zenodo.org/communities/eu
Creative Commons Attribution 4.0 International
Background
The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.
Results
Based on best-practice recommendations identified from the literature on workflow design, sharing, and publishing, we define a hierarchical provenance framework to achieve uniformity in provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realize this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We use open source community-driven standards, interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric research objects generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups.
Conclusions
The underlying principles of the standards utilized by CWLProv enable semantically rich and executable research objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, reuse the methods for partial reruns, or reproduce the analysis to validate the published findings.
Published in GigaScience Volume 8, Issue 11, November 2019, giz095.
Cite as: https://doi.org/10.1093/gigascience/giz095
European Commission
10.13039/501100000780
823830
BioExcel Centre of Excellence for ComputationalBiomolecular Research
European Commission
10.13039/501100000780
730976
Industrial Biotechnology Innovation and Synthetic Biology Accelerator
European Commission
10.13039/501100000780
675728
Centre of Excellence for Biomolecular Research