Published October 31, 2023 | Version v1
Project deliverable Open

BY-COVID D1.2: Preparedness Data Hub

  • 1. ROR icon European Bioinformatics Institute
  • 2. ROR icon Technical University of Denmark
  • 3. ROR icon Erasmus MC
  • 4. ROR icon Tampere University
  • 5. ROR icon SIB Swiss Institute of Bioinformatics

Description

The scope of the deliverable falls under Work Package Task 1.3 "Rapid deployment of the "preparedness" Data Hub" and the related subtasks for developing the tools (technical implementation) to allow the rapid deployment and configuration of a disease X scenario preparedness Data Hub. The system is intended to allow the rapid configuration of functions from a checklist of technical elements, including viral biology (genome browser, related viruses), surveillance (upload, data standards, integration tools), cohort data capabilities, computational processing and analytical workflows, Notebook visualisation, variation discovery and impact prediction and phylogeography.

In the past 24 months, work on this deliverable included the development of pathogen data classification based on taxonomy and tagging within the Pathogens Portal. To ensure consistency in pathogens classification, we adopted the UK's Health and Safety Executive's (HSE) list of approved biological agents which provides a definitive list on what constitutes a pathogen. At the same time we plan to expand beyond pathogens affecting humans, to plants for example. The accompanying tagging system is a simple 'tag=pathogen' query which overcomes the need to specify a very large number of taxonomic IDs. As part of pandemic preparedness, an Outbreaks page was developed within the Pathogens Portal to identify pathogens that can cause outbreaks or pandemics.

To better support users submitting pathogen data to a Data Hub, we developed a dedicated Pathogens Submission Guide which includes a list of six pathogen sample checklists, spanning prokaryotes, parasites and viruses. In addition, we maintain a helpdesk queue for submission-related queries, and continuation of the Contextual Data Clearing House, which allows the scientific community to extend or better-annotate pathogen metadata, such as via a Data Hub. WP1 partners The Arctic University of Norway (UiT) submitted a valuable dataset of over 27 million SARS-CoV-2 curations allowing us to identify areas of further development for the Clearing  House.

As part of the analysis pipeline exploration task, we tested a pipeline for antimicrobial resistance (AMR). WP1 partner DTU developed an AMR pipeline, called ARGprofiler (antimicrobial resistance genes) which was explored for its potential to be integrated within the Pathogens Platform, constituting part of a preparedness Data Hub. In addition, two further community developed pipelines (Bactopia and nf-core/funcscan) are being benchmarked by EMBL-EBI and assessed for integration into the Data Hubs system.  Furthermore, publicly developed and maintained viral metagenomic pipelines are being assessed for integration into the Data Hub system.

Finally, as part of the development of visualisation tools, Nextstrain, an open source project supporting real-time tracking of pathogen evolution was integrated in mid-2023. It includes Mpox (Monkeypox), Zika and West Nile Virus reports, and allows configuration (in the integration) of the visualisation results through various facets, including a phylogenetic tree, a map of the geographical distribution of the sequences and thor clade classification, as well as a genome browser presenting viral diversity. Note these reports are currently running for public data, and are not available for pre-publication/private data.

Files

BY-COVID_D1.2_Preparedness_Data_Hub.pdf

Files (910.9 kB)

Name Size Download all
md5:7a5e6875ebf75c6b30fbc7c502c581b6
910.9 kB Preview Download

Additional details

Funding

BY-COVID – Beyond COVID 101046203
European Commission

Dates

Submitted
2023-10-31