Making workflow provenance FAIR across workflow systems with Workflow Run RO-Crate
Creators
- 1. Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Italy
- 2. Barcelona Supercomputing Center (Spain)
- 3. VIB-UGent Center for Plant Systems Biology, Gent, Belgium
- 4. "VU Amsterdam, NL; DTL Projects, NL; FZJ, DE"
- 5. Ontology Engineering Group, Universidad Politécnica de Madrid
- 6. Università degli Studi di Torino, Torino, Italy
- 7. The University of Manchester, United Kingdom; University of Amsterdam, The Netherlands
Description
Workflow Run RO-Crate (https://w3id.org/ro/wfrun/), is a set of profiles of RO-Crate (https://doi.org/10.3233/DS-210053) that capture workflow provenance in a lightweight FAIR data package, in order to support traceability, reproducibility and interoperable description of diverse computational analysis.
We implemented the profile in multiple workflow systems, including Galaxy, COMPSs, StreamFlow, WfExS, Sapporo, Autosubmit. The command line tool runcrate (https://pypi.org/project/runcrate/) can convert from the precursor CWLProv (https://doi.org/10.1093/gigascience/giz095), and display or validate crates according to the profiles, with (prototype) repeat of a previous execution.
The profiles are organised by increasing levels of details, allowing gradual adaptation, ranging from arbitrary sets of computational processes (implied user-driven workflows), through a WorkflowHub-compatible crate with workflow definition, to a full provenance trace for each step, their input and output values.
This use of RO-Crate allows the contextualization of a computational workflow and its execution, e.g. relating to people, organisations, projects, funding, data sources and wider research questions and studies. For instance, in the TRE-FX project (https://trefx.uk/) such crates are used as a lingua franca across federated Trusted Research Environments, as it can also address the security and review aspects.
Workflow Run working group collaborates across ELIXIR nodes and EU-wide projects (BY-COVID, EOSC-Life, EJP-RD, EuroHPC, eFlows4HP, EuroScienceGateway, BioExcel-2) as well as national projects. After this first stable release of the profiles we are now expanding with more workflow systems, and tracking computational resources such as containers and memory usage.
Files
2023-06-05-elixir-ahm2023-poster-wfrun.pdf
Files
(4.4 MB)
Name | Size | Download all |
---|---|---|
md5:5fc56e52a9735bc16744c2c6fdd66673
|
712.8 kB | Preview Download |
md5:2f70811b1785211c55f2c797ce202e50
|
3.7 MB | Download |
Additional details
Funding
- European Commission
- eFlows4HPC – Enabling dynamic and Intelligent workflows in the future EuroHPCecosystem 955558
- European Commission
- FAIR-IMPACT – Expanding FAIR Solutions across EOSC 101057344
- European Commission
- EOSC-Life – Providing an open collaborative space for digital biology in Europe 824087
- European Commission
- BY-COVID – Beyond COVID 101046203
- European Commission
- EuroScienceGateway – leveraging the European compute infrastructures for data-intensive research guided by FAIR principles 101057388
- European Commission
- EJP RD – European Joint Programme on Rare Diseases 825575