Published May 3, 2024 | Version v1
Presentation Open

GES DISC Collaborations with Open-Source Communities to Migrate Data Collections to Cloud-Friendly Zarr Stores

  • 1. ROR icon Goddard Space Flight Center
  • 2. ROR icon Adnet Systems (United States)
  • 3. Telophase

Description

At the Goddard Earth Sciences (GES) Data and Information Services Center (DISC), we provide and promote the easy use of Earth science data, information, and services. A large part of our effort lies in keeping pace with evolving technologies and contributing to open data and software development. We will highlight three ongoing collaborations with open-source communities to migrate data collections to the scalable and flexible commercial cloud, and storing and managing them in a chunked, cloud-friendly format, Zarr: 1) We lead the community development process for GeoZarr, an OGC specification to extend Zarr capabilities for storing geospatial observations. 2) We explore solutions to the problem of managing and keeping up-to-date dynamic data typical in Earth science and will discuss a potential solution, the lakeFS platform, that allows for data version control. 3) We propose and develop an open-source method that pre-computes and stores cumulative sums of large, multi-dimensional Zarr data at the chunk level to provide fast and cost-efficient data analysis in the cloud. 

Files

Final full slides - SMD Workshop NASA HQ - Contributions to Open Source Software Development.pdf