Published April 28, 2023 | Version v1
Project deliverable Open

BY-COVID D1.1 Extended workflows

  • 1. EMBL-EBI
  • 2. Erasmus MC

Description

This deliverable is titled ‘Extended Workflows’, which is detailing the first wave of extensions and developments for viral data processing and computational workflows for operation in the SARS-CoV-2 Data Hubs. Described in greater detail below, the SARS-CoV-2 Data Hubs are a workspace, or toolbox, enabling users to share their sequence data, utilising analysis workflows to process, and then visualisation tools to ingest and visualise data for downstream interpretation. The private version of the data hubs enable for private, pre-publication data sharing amongst a group of collaborators.

To enable for analysis of sequence data via integrated workflows, EMBL-EBI, and specifically the European Nucleotide Archive (ENA), has developed and maintains a set of tools, software and utilises infrastructure. Collectively, this is known as the ENA Pathogen Analysis System, which is described in further detail below. The system presents a hybrid-cloud processing system including a Google BigQuery Database, Looker DataStudio, Nextflow, Docker, LSF cluster, Slurm cluster and Google Cloud Life Sciences API. This is setup with support of BY-COVID and has been in-place to analyse data within use-cases from other projects, such as VEO. Use-cases include public SARS-CoV-2 raw read dataset analysis within the COVID-19 Data Platform, but also extends into private data hub analysis and is in place for future workflows to be integrated into the system, following analysis and feasibility testing of those workflows. The first adaptation to the system has taken place in response to the outbreak of Monkeypox virus (Mpox or MPXV). This provides a solid foundation and infrastructure for future developments as part of the Pathogens Platform.

Overall, this deliverable describes the efforts undertaken to analyse an unprecedented amount of data from an infrastructure view, detailing some of the challenges we overcame to achieve this. Furthermore, the deliverable describes the ENA Pathogen Analysis System.

Files

BY-COVID WP1 Deliverable D.1.1.pdf

Files (509.6 kB)

Name Size Download all
md5:36eba5cf271989bd70bb27a4fe4a5830
509.6 kB Preview Download

Additional details

Funding

BY-COVID – Beyond COVID 101046203
European Commission