Published July 22, 2024 | Version v1
Project deliverable Open

BY-COVID D1.3. Tracking and open analytics tools

  • 1. ROR icon European Bioinformatics Institute

Contributors

Work package leader:

  • 1. ROR icon SIB Swiss Institute of Bioinformatics
  • 2. ROR icon European Bioinformatics Institute
  • 3. ROR icon Biobanking and Biomolecular Resources Research Infrastructure Consortium

Description

Deliverable 1.3 details technical aspects related to the Data Hubs addressing key areas with regards to open data sharing.  The deliverable scope addresses primarily tracking, analytical and other tools developed for Data Hubs that support localised instances of data hubs. In turn, this translates to work linked to the automation and maturation of the core Data Hubs at EMBL-EBI, also known as the Pathogen Data Hubs; a prerequisite for the second task within the WP - Local Data Hubs.

Local Data Hubs focus on generating a collection of standardised tools and workflows that follow standards at the European Nucleotide Archive (ENA), but can be used by collaborators using their own compute infrastructure and analysis subworkflows. The initial model detailed in this deliverable can be found on GitHub: https://github.com/enasequence/ena-local-datahub and has resulted in a standardised workflow template that has been created using DSL2 Nextflow. The foundations set by the initial model allow for an advanced model to be developed in the future, that will include providing a workflow repository to deposit analysis pipelines, share and collaborate. In addition to the local data hubs, the deliverable report describes the Contextual Data Clearinghouse (CDCH) which is a data object store for community generated metadata curations associated with public International Nucleotide Sequence Database Coalition (INSDC) records. It supports the Data Hubs by improving the FAIRness and quality of submitted records, as curations are presented alongside the dataset in the ENA Browser. The ClearingHouse has undergone some significant improvements since the submission of SARS-CoV-2 curations by the Arctic University of Norway in 2021.

Furthermore, the report covers the extension of the core Data Hubs at EMBL-EBI which have been significantly extended during the lifetime of the BY-COVID project to support the localised instances of Data Hubs. Data Hubs users can have three main roles each with their own sets of responsibilities and actions: Coordinator, Data Provider, Data Consumer. To handle requests in setting up data hubs, a registration procedure was created. Following the registration process, a browser-based Setup form is sent to the coordinator to complete. The Data Hubs also include a Life Cycle Policy (LCP)  that includes three separate statuses which define the life cycle of usage for an ENA Data Hub: Active, Dormant, Recycled.

Next steps in the final months of BY-COVID will be the release of frontend elements that are supporting the extension and greater automation of the Data Hubs:

- the release of the Setup form, which would be sent to approved requests for Data Hubs by coordinators, and

- the release of a data hub management interface, enabling coordinators to add/remove users to the Data Hub and edit aspects related to the Data Hub.

With the first version of the workflow template released, testing is the next main next step relating to the tool, which is envisaged to be carried out by WP1 partners, UiT and DTU. This will help identify any issues and fixes required, in addition to supporting planning for future work related to the Local Data Hubs.

The next steps related to the Clearinghouse, involve widely advertising its functionality through a supplementary document containing more general guidance for Clearinghouse users, and linking it out from an appropriate section of the existing ENA documentation. A new dataset of monkeypox curations is also planned to be submitted soon by WP1 partners, and there is ongoing work to index curations in the ENA Browser and Portal API.

Files

BY-COVID_D1.3_Tracking and open analytics tools.docx.pdf

Files (760.7 kB)

Additional details

Funding

European Commission
BY-COVID – Beyond COVID 101046203