Published June 1, 2023 | Version v2
Poster Open

Making workflow provenance FAIR across workflow systems with Workflow Run RO-Crate

  • 1. Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Italy
  • 2. Barcelona Supercomputing Center (Spain)
  • 3. VIB-UGent Center for Plant Systems Biology, Gent, Belgium
  • 4. "VU Amsterdam, NL; DTL Projects, NL; FZJ, DE"
  • 5. Ontology Engineering Group, Universidad Politécnica de Madrid
  • 6. Università degli Studi di Torino, Torino, Italy
  • 7. The University of Manchester, United Kingdom; University of Amsterdam, The Netherlands

Description

Workflow Run RO-Crate  (https://w3id.org/ro/wfrun/), is a set of profiles of RO-Crate (https://doi.org/10.3233/DS-210053) that capture workflow provenance in a lightweight FAIR data package, in order to support traceability, reproducibility and interoperable description of diverse computational analysis.

We implemented the profile in multiple workflow systems, including Galaxy, COMPSs, StreamFlow, WfExS, Sapporo, Autosubmit. The command line tool runcrate (https://pypi.org/project/runcrate/) can convert from the precursor CWLProv (https://doi.org/10.1093/gigascience/giz095), and display or validate crates according to the profiles, with (prototype) repeat of a previous execution.

The profiles are organised by increasing levels of details, allowing gradual adaptation, ranging from arbitrary sets of computational processes (implied user-driven workflows), through a WorkflowHub-compatible crate with workflow definition, to a full provenance trace for each step, their input and output values. 

This use of RO-Crate allows the contextualization of a computational workflow and its execution, e.g. relating to people, organisations, projects, funding, data sources and wider research questions and studies. For instance, in the TRE-FX project (https://trefx.uk/) such crates are used as a lingua franca across federated Trusted Research Environments, as it can also address the security and review aspects.

Workflow Run working group collaborates across ELIXIR nodes and EU-wide projects (BY-COVID, EOSC-Life, EJP-RD, EuroHPC, eFlows4HP, EuroScienceGateway, BioExcel-2) as well as national projects.  After this first stable release of the profiles we are now expanding with more workflow systems, and tracking computational resources such as containers and memory usage.

Files

2023-06-05-elixir-ahm2023-poster-wfrun.pdf

Files (4.4 MB)

Name Size Download all
md5:5fc56e52a9735bc16744c2c6fdd66673
712.8 kB Preview Download
md5:2f70811b1785211c55f2c797ce202e50
3.7 MB Download

Additional details

Funding

European Commission
eFlows4HPC – Enabling dynamic and Intelligent workflows in the future EuroHPCecosystem 955558
European Commission
FAIR-IMPACT – Expanding FAIR Solutions across EOSC 101057344
European Commission
EOSC-Life – Providing an open collaborative space for digital biology in Europe 824087
European Commission
BY-COVID – Beyond COVID 101046203
European Commission
EuroScienceGateway – leveraging the European compute infrastructures for data-intensive research guided by FAIR principles 101057388
European Commission
EJP RD – European Joint Programme on Rare Diseases 825575