Published October 5, 2023 | Version v2
Presentation Open

A Scientific Cloud Computing Platform for Ingestion and Processing of SDO Data

  • 1. University of Cambridge
  • 2. UCLAN
  • 3. Auburn University
  • 4. DataTalk AI
  • 5. Southwest Research Institute
  • 6. Dublin Institute for Advanced Studies

Description

The SDO mission has been collecting solar data for the past 13 years, producing a large dataset measured in the order of tens of petabytes. Such an immense dataset now contains information beyond that of an entire solar cycle, making it extremely valuable to the heliophysics community, and can be used for critical tasks such as space weather analysis and forecasting. However, getting access to all of this data can be challenging. In an attempt to overcome the challenges associated with the management of this incredible amount of data, we successfully created a  scientific computing platform, which we are now open-sourcing. 

There are several reasons why access to SDO data is difficult, but they may stem from a lack of existing data infrastructure to allow researchers easy access to the dataset. As part of the 2023 FDL-X Helio challenge, we have developed a data pipeline to ingest and transform this data into a readily available, and complete data product. Using Google Cloud Platform (GCP), we host the entire 13 years of AIA and HMI data with a 6 and 12-minute cadence respectively, in 512x512 resolution. We also calibrate the AIA data up to level 1.5 and the HMI images presented as (x, y, z) components of the magnetic field. Furthermore, the data has been curated so that the solar disk appears the same size in each image, making the data machine-learning ready.

We will present both our data infrastructure and the data product “SDOMLv2”, as well as discuss how we egressed the 13 years of data efficiently from its storage in JSOC. We will also discuss how our approach is generalizable and may be adopted in other scientific domains.

Notes

This work has been enabled by FDL-X (fdlxhelio.org); a derivative of Frontier Development Lab (FDL.ai); as a public/private partnership between NASA, Trillium Technologies and commercial AI partners Google Cloud and Nvidia.

Files

wfawcett_DASH.pdf

Files (5.5 MB)

Name Size Download all
md5:0cd526abbdf05082cee99feac9b803a4
5.5 MB Preview Download