Published May 7, 2024 | Version 1
Project deliverable Open

Blue-Cloud 2026 - D2.4 BDI sub-setting APIs and Data Lakes - Concept and Specifications Report

  • 1. MARIS
  • 1. National Oceanography Centre - British Oceanographic Data Centre
  • 2. National Research Council of Italy - Institute of Information Science and Technologies (CNR-ISTI)
  • 3. IFREMER
  • 4. ETT
  • 5. MARIS

Description

The Blue Cloud Data Discovery and Access Service (DD&AS) is one of the core components of the Blue-Cloud technical framework. It facilitates the discovery and retrieval of marine data sets and data products from blue data infrastructures (BDIs), such as key EU infrastructures such as EMODnet, SeaDataNet, EurOBIS, and more, for external users in stand-alone mode. It also interacts with the Blue-Cloud Virtual Research Environment, the component federating computing platforms and analytical services, for populating the VRE data pool. As part of the predecessor pilot Blue-Cloud project, the first operational release of the Blue-Cloud DD&AS has been deployed. The current version of the service is federating in total of 8 BDIs and with 9 data services. As part of the Blue-Cloud 2026 project, it is planned to expand and optimise the DD&AS and its FAIRness, among others by developing and deploying data sub-setting and extracting services, in addition to discovery and access, and by building Blue-Cloud Data Lakes for use by Blue-Cloud WorkBenches and beyond. This objective is subject to Task 2.3 which has been underway since the first STC meeting in March 2023 and the results of analysis and planning are now documented in this Report D2.4 -BDI sub-setting APIs and Data Lakes – Concept and Specifications Report. It is planned to deploy two configurations of data lakes, one for access by all Blue-Cloud VRE users, comprising a series of data lakes for data sets managed by selected Blue Data Infrastructures, engaged in the Blue-Cloud project, and from an international data repository, the World Ocean Database (WOD), managed by NOAA, USA. A second configuration will consist of two data lakes, one for WorkBench 1 (T&S) and one for Workbench 2 (Eutrophication), set up by merging and harmonising a number of data set collections for selected parameters (EOVs) from multiple BDIs. Moreover, it has been decided to undertake this challenge by adopting the new Beacon data lake technology, which Blue-Cloud 2026 partner and technical coordinator MARIS has been developing successfully over the last three years. This Beacon technology provides capabilities for building data lakes with fast subsetting. It allows slicing through millions of NetCDF files to give in an instant a homogeneous output file with values that could be used for further analytics or to drive a viewer to show in a sliding way e.g. temperature in the oceans at selected dates, locations, and depths. Finally, the Deliverable documents the implementation plan with actions, which are partly underway and partly planned in the near future as part of WP2 for establishing the data lake configurations, their deployment at the Blue-Cloud VRE, and their provision to Blue-Cloud VRE users respectively their integration with analytical pipelines as planned for the WorkBenches.

Files

Blue-Cloud_2026_D2.4_-_BDI_sub-setting_APIs_and_Data_Lakes_Concept_and_Specifications_Report.pdf

Additional details

Funding

European Commission
Blue-Cloud 2026 – A federated European FAIR and Open Research Ecosystem for oceans, seas, coastal and inland waters 101094227