10.7490/f1000research.1115721.1
https://zenodo.org/records/1304969
oai:zenodo.org:1304969
Farah Zaib Khan
Farah Zaib Khan
0000-0002-6337-3037
The University of Melbourne, Australia; Common Workflow Language project
Stian Soiland-Reyes
Stian Soiland-Reyes
0000-0001-9842-9718
The University of Manchester; Common Workflow Language project
Richard O. Sinnott
Richard O. Sinnott
0000-0001-5998-222X
The University of Melbourne, Australia
Andrew Lonie
Andrew Lonie
0000-0002-2006-3856
The University of Melbourne, Australia
Michael R. Crusoe
Michael R. Crusoe
0000-0002-2961-9670
Common Workflow Language project
CWLProv – Interoperable retrospective provenance capture and its challenges
Zenodo
2018
research object
scientific workflow
workflow, provenance, prov, cwl, interoperability, linked data
2018-06-27
eng
Poster
10.5281/zenodo.1215611
https://w3id.org/cwl/view/git/886df9de6713e06228d2560c40f451155a196383/tools/tRNA_selection.cwl
https://github.com/common-workflow-language/cwltool
https://f1000research.com/posters/7-916
https://slides.com/farahzkhan/cwlprov
https://github.com/common-workflow-language/cwltool/pull/676
10.3390/informatics5010011
10.1101/191783
10.6084/m9.figshare.3115156.v2
10.1016/j.future.2011.08.004
http://ceur-ws.org/Vol-903/paper-01.pdf
10.1016/j.websem.2015.01.003
10.1371/journal.pone.0080278
10.1016/j.future.2017.01.008
10.1186/2041-1480-5-41
10.1186/s12859-017-1747-0
https://tools.ietf.org/id/draft-kunze-bagit-16
10.1101/268755
10.1093/nar/gkx967
10.17061/phrp2541541
10.1016/j.ascom.2014.09.002
10.1371/journal.pcbi.1003285
10.1126/science.aah6168
10.1093/nar/gkw1032
10.1007/11890850_16
10.1109/eScience.2012.6404482
10.1016/j.future.2013.09.018
https://zenodo.org/communities/ro
https://zenodo.org/communities/eu
https://zenodo.org/communities/bioexcel
Creative Commons Attribution 4.0 International
Presented at Bioinformatics Open Source Conference (BOSC) 2018
Source Code snapshot: https://github.com/common-workflow-language/cwltool/tree/921fc1d387930a0a5fede332c43f039697f6a4de
License: https://www.apache.org/licenses/LICENSE-2.0
Research Object: https://doi.org/10.5281/zenodo.1215611
Abstract (accepted for poster and talk at BOSC2018)
The automation of data analysis in the form of scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still a number of challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and the dependence on the particular technical (software) platforms.
We present CWLProv, an approach for retrospective provenance-capture utilizing open source community-driven standards involving application and customization of workflow-centric Research Objects (ROs). The ROs are produced as an output of a workflow enactment defined in the Common Workflow Language (CWL) using reference implementation cwltool.
The approach aggregates and annotates all the resources involved in the scientific investigation including inputs, outputs, workflow specification, command line tool specifications and input parameter settings. The resources are linked within the RO to enable re-enactment of an analysis without depending on external resources. The workflow provenance profile is represented in W3C standardized PROV-N and PROV-JSON format and captures retrospective provenance of the workflow enactment.
The workflow-centric RO produced as an output of a CWL workflow enactment is expected to be interoperable, reusable, shareable and portable across different platforms. Our work describes the need and motivation for CWLProv and the lessons learned in applying it for ROs using CWL in the bioinformatics domain. The complete capture of provenance along with the aggregated resources used in a workflow enactment will mitigate the workflow decay and allow applications of provenance to make experiments transparent, reproducible and authentic.
We believe that underlying principles of the standards utilized to implement CWLProv will result in a semantically rich executable workflow objects such that any platform supporting CWL and CWLProv will be able to reproduce them. We ultimately aim to achieve a solution that is compliant with all four dimensions of FAIR principles. Currently CWLProv is implemented using the reference implementation, cwltool. This study can further be extended to support Provenance Capture on other platforms supporting CWL to demonstrate interoperability of analysis methods.
FZK funded by MIRS and MIFRS scholarships. SSR funded by BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contract H2020-EINFRA-2015-1-675728. SSR and MRC are members of the leadership team for Common Workflow Language at the Software Freedom Conservancy.
European Commission
10.13039/501100000780
675728
Centre of Excellence for Biomolecular Research