There is a newer version of the record available.

Published October 23, 2023 | Version v2
Dataset Open

Data for simultaneous inference of sea ice state and surface emissivity model using machine learning and data assimilation

Creators

  • 1. European Centre for Medium-range Weather Forecasts

Description

Overview

This dataset supports the draft manuscript "Simultaneous inference of sea ice state and surface emissivity model using machine learning and data assimilation" which describes a way to infer the daily maps of the sea ice concentration and empirical properties of the sea ice (relating to its snow cover and its physical properties, such as air inclusions) along with the creation of a new empirical model for the sea ice surface emissivity. This is done using knowledge of the atmosphere state, skin temperature and ocean water emissivity from the European Centre for Medium-range Weather Forecasts (ECMWF) weather forecasting model and the observed radiances at microwave frequencies from the Advanced Microwave Scanning Radiometer 2 (AMSR2). The inverse modelling and state estimation is achieved by combining empirical machine learning elements in a Bayesian-inspired network along with a number of physical components. The work also introduces the idea of an "empirical state", in this case describing the aspects of the sea ice physical state which affect the observations, and which is defined by the inputs to the new empirical model component (in machine learning terms, it is defined by the latent input state of a neural network). This dataset includes the  data used in training the model and inferring the sea ice parameters, as well as the outputs from that training process. The software used to perform the training is in Python and uses the Keras and Tensorflow software. See the draft manuscript for full details of this data.

The code used in the draft manuscript is archived at https://doi.org/10.5281/zenodo.10013542

The data used in the draft manuscript is archived at https://doi.org/10.5281/zenodo.10033377

Training data 

Observation space training and ancillary data

Training is done at the location of AMSR2 superobservations (superobs) over ocean with less than 1% land contamination and polewards of 45 degrees latitude, between 1st July 2020 and 30th June 2021. There are 64,184,021 superobs used. A superob is the average of all raw JAXA level 1B observations from one orbit falling into a grid box on an approximately constant area (reduced Gaussian) grid at approximately 40 km by 40 km resolution (noting that polar regions can thus have up to around 7 superobs per day). The superobs have been computed using the field of view central locations for each channel as derived from the JAXA level 1B data. A subset of 10 of the AMSR2 channels is used, from 10 GHz, V polarised, to 89 GHz, H polarised.

At each superob location, the relevant fields from the ECMWF 12 hour 'background' forecast are interpolated to the observation time and location. The atmosphere is represented indirectly by the relevant radiative transfer terms from a scattering radiative transfer model. The sea ice concentration from the ECMWF OCEAN5 analysis is included as a validation reference but is not used in the training itself, except to provide a monthly mean first guess to speed up the training. Each field is provided in a separate netCDF file:

  • field_v2_JULIAN_DAY.nc - superob time in days since 12 UTC on Nov 24th 4714 BC on the proleptic Gregorian calendar
  • field_v2_LAT.nc - superob central latitude in degrees
  • field_v2_LON.nc - superob central longitude in degrees
  • field_v2_IGRID.nc - corresponding grid number on the map grid used in this work (see below)
  • field_v2_OBSVALUE.nc - observed superob brightness temperature at each of 10 AMSR2 channels.
  • field_v2_TSFC.nc - skin temperature computed by the ECMWF forecast model
  • field_v2_WINDSPEED10M.nc - 10m wind speed computed by the ECMWF forecast model
  • field_v2_EMIS_WATER.nc - Ocean water surface emissivity at 10 AMSR2 channels, simulated from the ECMWF forecast fields using the FASTEM-6 model
  • field_v2_CLOUD_FRACTION.nc - Effective cloud fraction used in the atmospheric radiative transfer model at each of 10 AMSR2 channels
  • field_v2_TAUSFC_CLD.nc - Surface to space transmittance in the cloudy column at each of 10 AMSR2 channels
  • field_v2_TUP_CLD.nc - Upwelling brightness temperature from the atmosphere in the cloudy column at each of 10 AMSR2 channels
  • field_v2_TDOWN_CLD.nc - Downwelling brightness temperature from the atmosphere in the cloudy column at each of 10 AMSR2 channels
  • field_v2_TAUSFC.nc - Equivalently for the clear column
  • field_v2_TUP.nc - Equivalently for the clear column
  • field_v2_TDOWN.nc - Equivalently for the clear column
  • field_v2_SEAICE.nc - Sea ice concentration from the ECMWF OCEAN5 analysis, for validation only (not used in training)

Grid space data: initial data for training; validation sea ice data

A number of properties are provided to the hybrid physical-empirical model that is being trained, on a special map grid defined in this project, including all 62,499 of the reduced Gaussian 40km grid points that have at least one superob at some point during the year of training data. These are:

  • ifs_seaice_initials_year.nc - sea ice concentration from OCEAN5, monthly averaged on the grid, and then provided on all days of the relevant month as initial conditions (technically, first guess) for the training. This includes an additional day before the beginning of the training, used for time-lagging (see draft paper).
  • ifs_tsfc_year_dailyx.nc - skin temperature from ECMWF forecast fields at observation locations, averaged onto the daily grid, to help provide constraints on the likelihood of sea ice as part of a sea ice loss function.

For diagnostic and validation purposes, the ECMWF OCEAN5 analysis is also provided on the grid:

  • ifs_seaice_year.nc - sea ice concentration from OCEAN5 at observation locations, averaged onto the daily grid

All these fields are provided on the following dimensions:

  • LON - the longitude of the grid point in degrees
  • DAY - the day through the training year (0-364, 1st July 2020 to 30th June 2021) or through the training year extended forward by one day (30th June 2020) for the sea ice (0-365). In practice the days are offset by 3 hours from the UTC day to match the ECMWF data assimilation windows, which start at 21 UTC the day before.

The latitude is also provided

  • LAT - the latitude of the grid point in degrees

Note that the observation location IGRID is on the custom grid of the ML model that is defined implicitly in these gridded files. The LON and LAT vectors in these files are the longitude and latitude points of the grid and are of 62499 in length. The IGRID number for an observation is the index into these arrays from 0-62498.

Outputs from training

The following files are the output and diagnostics from the year-long training. The python code and the draft paper are the primary documentation for these:

  • models_year.nc - settings of the model are recorded here, along with the trained values of the smaller empirical components/layers within the hybrid model. For example, the layer weights of the wind speed bias correction, the observation space bias correction, and the empirical surface emissivity model are recorded here. The values of the loss function at each epoch are also recorded here.
  • properties_year.nc - trained values of each of 3 empirical properties of sea ice on the map grid (3 properties by 62499 locations by 365 days from 1st July 2020)
  • seaice_year.nc - inferred values of sea ice fraction on the map grid (62499 locations by 365 days from 1st July 2020, discarding the additional day at the start)
  • tbsim_year.nc - simulated AMSR2 brightness temperatures from the trained network
  • tbsim_initial_year.nc - simulated AMSR2 brightness temperatures using the untrained network

The longitude and latitude of the map grid is found in any of the initial data files described in the previous section. The days are 0-364 corresponding to 1st July 2020 to 30th June 2021.

Sensitivity tests

Extensive sensitivity tests were carried out, as described in the appendices of the draft paper and as documented in the Python code, using the month of August 2020 as an example. These required equivalent month-long training and initial data similar to those described above, but all observation space fields are contained within the same file in this case. Output files follow similar principles to those described above. The full package is provided as a tar file:

  • sensitivity.tar

This contains the training and initial files:

  • amsr2_v2_202008.nc
  • ifs_tsfc_dailyx_202008.nc
  • ifs_seaice_202008.nc

as well as directories containing the trained model outputs and diagnostics at each of the sensitivity tests, using the same formats as described for the yearly training, with these names:

  • nprop - number of empirical properties
  • epoch - number of epochs
  • deep - configuration of the empirical sea ice emissivity model, including multiple layers of nonlinear dense neural network
  • bseaice - background error for the sea ice physical bounds background error (loss) term
  • bemis - background error for the sea ice emissivity background error (loss) term
  • bbias - background error for the bias correction background error (loss) term
  • batchsize - batch size used in training
  • bbatchsize - extended epochs testing of batch size used in training

Licensing

This data product is published under a Creative Commons Attribution 4.0 International (CC BY 4.0). To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

You are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

  • You must give appropriate credit (attribution) to ECMWF as outlined below, provide a link to the licence, and indicate if changes were made.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the licence permits.

The following wording shall be attached to the use of this ECMWF data product: 

  1. Copyright statement: Copyright "© 2023 European Centre for Medium-Range Weather Forecasts (ECMWF)".
  2. Source www.ecmwf.int and https://doi.org/10.5281/zenodo.10009498
  3. Licence Statement: This data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0). https://creativecommons.org/licenses/by/4.0/
  4. Disclaimer: ECMWF does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.
  5. Where applicable, an indication if the material has been modified and an indication of previous modifications.
  6. DOI: 10.5281/zenodo.10009498

Original data for this value-added product was provided by Japan Aerospace Exploitation Agency (JAXA). Specifically, this dataset builds on the Advanced Microwave Scanning Radiometer 2 (AMSR2) level 1B data available from the JAXA G-Portal, https://gportal.jaxa.jp/gpr/, which has the following attribution and licensing:

  1. Give credit for the original data to JAXA, i.e. "Original data for this value added data product was provided by Japan Aerospace Exploration Agency"
  2. DOI for original JAXA data is L1B-Brightness temperature (TB) GCOM-W/AMSR2 L1B Brightness Temperature: https://doi.org/10.57746/EO.01gs73ans548qghaknzdjyxd2h
  3. Original terms of data service from JAXA, with highlighted extracts:
    • https://gportal.jaxa.jp/gpr/index/eula
      1. The user is entitled to use G-Portal data free of charge without any restrictions (including commercial use) except for the condition about acknowledgement of data credit as stipulated in Article 7.(2). (see above)
      2. JAXA is collecting results (papers, theses, reports, etc.) using G-Portal data. If you have any results using G-Portal data, please mail/e-mail a copy of the result to G-Portal Support Desk (Contact Information written at the end of the Terms of Use). We appreciate your cooperation very much.

Files

Files (35.2 GB)

Name Size Download all
md5:f94371e3bc6a60bd530fbe34c9c986ef
256.7 MB Download
md5:cf75e409aa4cb84d827edea7b470f9ca
2.6 GB Download
md5:420a741af8efb197e5bf9f26d2a2de51
256.7 MB Download
md5:4ded19ed48b603f23f57cad62c0606d2
513.5 MB Download
md5:266fbf77a18d81b4f365b710b66b9b0a
256.7 MB Download
md5:638f481dcfc17e90bbe8c10eb948fde6
256.7 MB Download
md5:c8a8b34bbd56e1c77532185e40829bc5
2.6 GB Download
md5:56ed4227a04efa9a7b8418a23c7e1623
256.7 MB Download
md5:6bc5e1d10773f7a798058e90ee669b42
2.6 GB Download
md5:92f36f1e840bcb57b6f66a6d26fcc313
2.6 GB Download
md5:4d83911f094b65de2fbcc2fae036874b
2.6 GB Download
md5:70069a7cf099f1169155993aa1805486
2.6 GB Download
md5:716d9cf2e99b81d5218e0212ba96ba0a
256.7 MB Download
md5:5758c86071530c10e17f2ac4e6d8a27b
2.6 GB Download
md5:29f9794a926a8f6518bfce7f14d9157c
2.6 GB Download
md5:2e4b63fabb49fa6d6de8eb82f1b1aa1d
256.7 MB Download
md5:7c87534bc87418c184b8797f8c6341bb
92.0 MB Download
md5:73acc82614c135e3631421321a5d74f3
183.0 MB Download
md5:79a818e13fed16f55f8422a4708d2d2f
91.8 MB Download
md5:ff61d2e5cb83e0985bc1e7925ee19380
22.0 kB Download
md5:1ff193e4a4a6425b1b61412ceab285ba
274.3 MB Download
md5:e8cf5cdbc9f42b100c8ad77b67804533
91.8 MB Download
md5:93e9f037aedd628faf259779e1b74820
5.5 GB Download
md5:0e99f0906c3d5a9a731d2f426997abae
3.1 GB Download
md5:b33cb7f4142e2117d1532d8332200cd6
3.1 GB Download