Published May 7, 2024
| Version 1
Project deliverable
Blue-Cloud 2026 - D2.4 BDI sub-setting APIs and Data Lakes - Concept and Specifications Report
- 1. National Oceanography Centre - British Oceanographic Data Centre
- 2. National Research Council of Italy - Institute of Information Science and Technologies (CNR-ISTI)
- 4. ETT
- 5. MARIS
The Blue Cloud Data Discovery and Access Service (DD&AS) is one of the core
components of the Blue-Cloud technical framework. It facilitates the
discovery and retrieval of marine data sets and data products from blue
data infrastructures (BDIs), such as key EU infrastructures such as
EMODnet, SeaDataNet, EurOBIS, and more, for external users in stand-alone
mode. It also interacts with the Blue-Cloud Virtual Research Environment,
the component federating computing platforms and analytical services, for
populating the VRE data pool. As part of the predecessor pilot Blue-Cloud
project, the first operational release of the Blue-Cloud DD&AS has been
deployed. The current version of the service is federating in total of 8
BDIs and with 9 data services. As part of the Blue-Cloud 2026 project, it
is planned to expand and optimise the DD&AS and its FAIRness, among others
by developing and deploying data sub-setting and extracting services, in
addition to discovery and access, and by building Blue-Cloud Data Lakes for
use by Blue-Cloud WorkBenches and beyond. This objective is subject to Task
2.3 which has been underway since the first STC meeting in March 2023 and
the results of analysis and planning are now documented in this Report D2.4
-BDI sub-setting APIs and Data Lakes – Concept and Specifications Report.
It is planned to deploy two configurations of data lakes, one for access by
all Blue-Cloud VRE users, comprising a series of data lakes for data sets
managed by selected Blue Data Infrastructures, engaged in the Blue-Cloud
project, and from an international data repository, the World Ocean
Database (WOD), managed by NOAA, USA. A second configuration will consist
of two data lakes, one for WorkBench 1 (T&S) and one for Workbench 2
(Eutrophication), set up by merging and harmonising a number of data set
collections for selected parameters (EOVs) from multiple BDIs. Moreover, it
has been decided to undertake this challenge by adopting the new Beacon
data lake technology, which Blue-Cloud 2026 partner and technical
coordinator MARIS has been developing successfully over the last three
years. This Beacon technology provides capabilities for building data lakes
with fast subsetting. It allows slicing through millions of NetCDF files to
give in an instant a homogeneous output file with values that could be used
for further analytics or to drive a viewer to show in a sliding way e.g.
temperature in the oceans at selected dates, locations, and depths.
Finally, the Deliverable documents the implementation plan with actions,
which are partly underway and partly planned in the near future as part of
WP2 for establishing the data lake configurations, their deployment at the
Blue-Cloud VRE, and their provision to Blue-Cloud VRE users respectively
their integration with analytical pipelines as planned for the WorkBenches.
(4.0 MB)
Name | Size | Download all |
4.0 MB | Preview Download |