Published June 7, 2023 | Version v1
Other Open

Encouraging and supporting researchers in producing FAIR computational workflows - Use Case by University of Manchester

  • 1. UNIMAN
  • 2. CSC

Description

This use case is based around the University of Manchester's work with Persistent Identifiers in data production workflows via its involvement in the WorkflowHub - a registry of computational FAIR workflows. WorkflowHub is sponsored by the European RI Cluster EOSC-Life, the European Research Infrastructure ELIXIR and multiple EOSC projects (BY-COVID, BioDT and EuroScienceGateway). Its initial users were from within the life sciences working with COVID-19 workflows, but is now used by over 140 research groups and projects across disciplines.  

The overall goal of this use case is to encourage and support FAIR Computational Workflows, where workflow systems help researchers in producing FAIR data and recording provenance of their analysis, but also where workflows themselves become FAIR scholarly objects in their own right, appear in the scholarly knowledge graph, gets cited in academic papers, and so on.

Workflows of any type (e.g. Galaxy, CWL, Nextflow, Jupyter Notebook) are registered in WorkflowHub from existing repositories like GitHub, or can be deposited as a direct upload. Metadata is extracted from the workflow and augmented by the user. This is archived in the form of an RO-Crate that also contains a snapshot of the executable workflow definition. The metadata uses JSON-LD and schema.org vocabulary for Dataset, together with a Bioschemas profile for computational workflows. WorkflowHub also uses the standard Common Workflow Language (CWL) as a way to describe the workflow structure and detailed annotations such as tools and containers required. 

Workflows can be composed of various types of research objects which need to be formally and persistently identified to enable their reuse by other researchers. There are various challenges that need to be addressed resulting from the diverse types of identifiers of the various workflow components. In FAIR-IMPACT we are therefore following several strands to improve persistent identifiers for computational workflows:

  1. Improve and document explicit identification and linking (FAIR Signposting) from WorkflowHub to PIDs, metadata and RO-Crate downloads
  2. Enable an automatic request and recording function of Software Heritage identifiers (SWHID) when archiving a Git-based workflow
  3. Capture and expose PIDs for tools used by workflows (e.g. bio.tools, Bioconda) from Galaxy
  4. Generate location-independent identifiers (RFC6920) for data generated by workflow runs, potentially large/sensitive, to be included in workflow provenance
  5. Leverage RO-Crate to capture and propagate workflow provenance outputs and related PIDs
  6. Create RO-Crate profiles for capturing the provenance of an execution of a computational workflow with increasing granularity

Files

Encouraging and supporting researchers in producing FAIR computational workflows. Use case by University of Manchester.pdf

Additional details

Funding

FAIR-IMPACT – Expanding FAIR Solutions across EOSC 101057344
European Commission