Data for simultaneous inference of sea ice state and surface emissivity model using machine learning and data assimilation
Description
Overview
This dataset supports the draft manuscript "Simultaneous inference of sea ice state and surface emissivity model using machine learning and data assimilation" which describes a way to infer the daily maps of the sea ice concentration and empirical properties of the sea ice (relating to its snow cover and its physical properties, such as air inclusions) along with the creation of a new empirical model for the sea ice surface emissivity. This is done using knowledge of the atmosphere state, skin temperature and ocean water emissivity from the European Centre for Medium-range Weather Forecasts (ECMWF) weather forecasting model and the observed radiances at microwave frequencies from the Advanced Microwave Scanning Radiometer 2 (AMSR2). The inverse modelling and state estimation is achieved by combining empirical machine learning elements in a Bayesian-inspired network along with a number of physical components. The work also introduces the idea of an "empirical state", in this case describing the aspects of the sea ice physical state which affect the observations, and which is defined by the inputs to the new empirical model component (in machine learning terms, it is defined by the latent input state of a neural network). This dataset includes the data used in training the model and inferring the sea ice parameters, as well as the outputs from that training process. The software used to perform the training is in Python and uses the Keras and Tensorflow software. See the draft manuscript for full details of this data.
The code used in the draft manuscript is archived at https://doi.org/10.5281/zenodo.10013542
The data used in the draft manuscript is archived at https://doi.org/10.5281/zenodo.10009498
Training data
Observation space training and ancillary data
Training is done at the location of AMSR2 superobservations (superobs) over ocean with less than 1% land contamination and polewards of 45 degrees latitude, between 1st July 2020 and 30th June 2021. There are 64,184,021 superobs used. A superob is the average of all raw JAXA level 1B observations from one orbit falling into a grid box on an approximately constant area (reduced Gaussian) grid at approximately 40 km by 40 km resolution (noting that polar regions can thus have up to around 7 superobs per day). The superobs have been computed using the field of view central locations for each channel as derived from the JAXA level 1B data. A subset of 10 of the AMSR2 channels is used, from 10 GHz, V polarised, to 89 GHz, H polarised.
At each superob location, the relevant fields from the ECMWF 12 hour 'background' forecast are interpolated to the observation time and location. The atmosphere is represented indirectly by the relevant radiative transfer terms from a scattering radiative transfer model. The sea ice concentration from the ECMWF OCEAN5 analysis is included as a validation reference but is not used in the training itself, except to provide a monthly mean first guess to speed up the training. Each field is provided in a separate netCDF file:
- field_v2_JULIAN_DAY.nc - superob time in days since 12 UTC on Nov 24th 4714 BC on the proleptic Gregorian calendar
- field_v2_LAT.nc - superob central latitude in degrees
- field_v2_LON.nc - superob central longitude in degrees
- field_v2_IGRID.nc - corresponding grid number on the map grid used in this work (see below)
- field_v2_OBSVALUE.nc - observed superob brightness temperature at each of 10 AMSR2 channels.
- field_v2_TSFC.nc - skin temperature computed by the ECMWF forecast model
- field_v2_WINDSPEED10M.nc - 10m wind speed computed by the ECMWF forecast model
- field_v2_EMIS_WATER.nc - Ocean water surface emissivity at 10 AMSR2 channels, simulated from the ECMWF forecast fields using the FASTEM-6 model
- field_v2_CLOUD_FRACTION.nc - Effective cloud fraction used in the atmospheric radiative transfer model at each of 10 AMSR2 channels
- field_v2_TAUSFC_CLD.nc - Surface to space transmittance in the cloudy column at each of 10 AMSR2 channels
- field_v2_TUP_CLD.nc - Upwelling brightness temperature from the atmosphere in the cloudy column at each of 10 AMSR2 channels
- field_v2_TDOWN_CLD.nc - Downwelling brightness temperature from the atmosphere in the cloudy column at each of 10 AMSR2 channels
- field_v2_TAUSFC.nc - Equivalently for the clear column
- field_v2_TUP.nc - Equivalently for the clear column
- field_v2_TDOWN.nc - Equivalently for the clear column
- field_v2_SEAICE.nc - Sea ice concentration from the ECMWF OCEAN5 analysis, for validation only (not used in training)
Grid space data: initial data for training; validation sea ice data
A number of properties are provided to the hybrid physical-empirical model that is being trained, on a special map grid defined in this project, including all 62,499 of the reduced Gaussian 40km grid points that have at least one superob at some point during the year of training data. These are:
- ifs_seaice_initials_year.nc - sea ice concentration from OCEAN5, monthly averaged on the grid, and then provided on all days of the relevant month as initial conditions (technically, first guess) for the training. This includes an additional day before the beginning of the training, used for time-lagging (see draft paper).
- ifs_tsfc_year_dailyx.nc - skin temperature from ECMWF forecast fields at observation locations, averaged onto the daily grid, to help provide constraints on the likelihood of sea ice as part of a sea ice loss function.
For diagnostic and validation purposes, the ECMWF OCEAN5 analysis is also provided on the grid:
- ifs_seaice_year.nc - sea ice concentration from OCEAN5 at observation locations, averaged onto the daily grid
All these fields are provided on the following dimensions:
- LON - the longitude of the grid point in degrees
- DAY - the day through the training year (0-364, 1st July 2020 to 30th June 2021) or through the training year extended forward by one day (30th June 2020) for the sea ice (0-365). In practice the days are offset by 3 hours from the UTC day to match the ECMWF data assimilation windows, which start at 21 UTC the day before.
The latitude is also provided
- LAT - the latitude of the grid point in degrees
Outputs from training
The following files are the output and diagnostics from the year-long training. The python code and the draft paper are the primary documentation for these:
- models_year.nc - settings of the model are recorded here, along with the trained values of the smaller empirical components/layers within the hybrid model. For example, the layer weights of the wind speed bias correction, the observation space bias correction, and the empirical surface emissivity model are recorded here. The values of the loss function at each epoch are also recorded here.
- properties_year.nc - trained values of each of 3 empirical properties of sea ice on the map grid (3 properties by 62499 locations by 365 days from 1st July 2020)
- seaice_year.nc - inferred values of sea ice fraction on the map grid (62499 locations by 365 days from 1st July 2020, discarding the additional day at the start)
- tbsim_year.nc - simulated AMSR2 brightness temperatures from the trained network
- tbsim_initial_year.nc - simulated AMSR2 brightness temperatures using the untrained network
The longitude and latitude of the map grid is found in any of the initial data files described in the previous section. The days are 0-364 corresponding to 1st July 2020 to 30th June 2021.
Sensitivity tests
Extensive sensitivity tests were carried out, as described in the appendices of the draft paper and as documented in the Python code, using the month of August 2020 as an example. These required equivalent month-long training and initial data similar to those described above, but all observation space fields are contained within the same file in this case. Output files follow similar principles to those described above. The full package is provided as a tar file:
- sensitivity.tar.gz
This contains the training and initial files:
- ifs_tsfc_dailyx_202008.nc
- ifs_seaice_202008.nc
- amsr2_v2_202008.nc
as well as directories containing the trained model outputs and diagnostics at each of the sensitivity tests, using the same formats as described for the yearly training, with these names:
- nprop - number of empirical properties
- epoch - number of epochs
- deep - configuration of the empirical sea ice emissivity model, including multiple layers of nonlinear dense neural network
- bseaice - background error for the sea ice physical bounds background error (loss) term
- bemis - background error for the sea ice emissivity background error (loss) term
- bbias - background error for the bias correction background error (loss) term
- batchsize - batch size used in training
- bbatchsize - extended epochs testing of batch size used in training
Licensing
This data product is published under a Creative Commons Attribution 4.0 International (CC BY 4.0). To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
- You must give appropriate credit (attribution) to ECMWF as outlined below, provide a link to the licence, and indicate if changes were made.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the licence permits.
The following wording shall be attached to the use of this ECMWF data product:
- Copyright statement: Copyright "© 2023 European Centre for Medium-Range Weather Forecasts (ECMWF)".
- Source www.ecmwf.int and https://doi.org/10.5281/zenodo.10009498
- Licence Statement: This data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0). https://creativecommons.org/licenses/by/4.0/
- Disclaimer: ECMWF does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.
- Where applicable, an indication if the material has been modified and an indication of previous modifications.
- DOI: 10.5281/zenodo.10009498
Original data for this value-added product was provided by Japan Aerospace Exploitation Agency (JAXA). Specifically, this dataset builds on the Advanced Microwave Scanning Radiometer 2 (AMSR2) level 1B data available from the JAXA G-Portal, https://gportal.jaxa.jp/gpr/, which has the following attribution and licensing:
- Give credit for the original data to JAXA, i.e. "Original data for this value added data product was provided by Japan Aerospace Exploration Agency"
- DOI for original JAXA data is L1B-Brightness temperature (TB) GCOM-W/AMSR2 L1B Brightness Temperature: https://doi.org/10.57746/EO.01gs73ans548qghaknzdjyxd2h
- Original terms of data service from JAXA, with highlighted extracts:
- https://gportal.jaxa.jp/gpr/index/eula
- The user is entitled to use G-Portal data free of charge without any restrictions (including commercial use) except for the condition about acknowledgement of data credit as stipulated in Article 7.(2). (see above)
- JAXA is collecting results (papers, theses, reports, etc.) using G-Portal data. If you have any results using G-Portal data, please mail/e-mail a copy of the result to G-Portal Support Desk (Contact Information written at the end of the Terms of Use). We appreciate your cooperation very much.
- https://gportal.jaxa.jp/gpr/index/eula
Files
Files
(31.4 GB)
Name | Size | Download all |
---|---|---|
md5:f94371e3bc6a60bd530fbe34c9c986ef
|
256.7 MB | Download |
md5:cf75e409aa4cb84d827edea7b470f9ca
|
2.6 GB | Download |
md5:420a741af8efb197e5bf9f26d2a2de51
|
256.7 MB | Download |
md5:4ded19ed48b603f23f57cad62c0606d2
|
513.5 MB | Download |
md5:266fbf77a18d81b4f365b710b66b9b0a
|
256.7 MB | Download |
md5:638f481dcfc17e90bbe8c10eb948fde6
|
256.7 MB | Download |
md5:c8a8b34bbd56e1c77532185e40829bc5
|
2.6 GB | Download |
md5:56ed4227a04efa9a7b8418a23c7e1623
|
256.7 MB | Download |
md5:6bc5e1d10773f7a798058e90ee669b42
|
2.6 GB | Download |
md5:92f36f1e840bcb57b6f66a6d26fcc313
|
2.6 GB | Download |
md5:4d83911f094b65de2fbcc2fae036874b
|
2.6 GB | Download |
md5:70069a7cf099f1169155993aa1805486
|
2.6 GB | Download |
md5:716d9cf2e99b81d5218e0212ba96ba0a
|
256.7 MB | Download |
md5:5758c86071530c10e17f2ac4e6d8a27b
|
2.6 GB | Download |
md5:29f9794a926a8f6518bfce7f14d9157c
|
2.6 GB | Download |
md5:2e4b63fabb49fa6d6de8eb82f1b1aa1d
|
256.7 MB | Download |
md5:7c87534bc87418c184b8797f8c6341bb
|
92.0 MB | Download |
md5:73acc82614c135e3631421321a5d74f3
|
183.0 MB | Download |
md5:79a818e13fed16f55f8422a4708d2d2f
|
91.8 MB | Download |
md5:ff61d2e5cb83e0985bc1e7925ee19380
|
22.0 kB | Download |
md5:1ff193e4a4a6425b1b61412ceab285ba
|
274.3 MB | Download |
md5:e8cf5cdbc9f42b100c8ad77b67804533
|
91.8 MB | Download |
md5:e8bc4128620b540e91c74f14141881cf
|
1.7 GB | Download |
md5:0e99f0906c3d5a9a731d2f426997abae
|
3.1 GB | Download |
md5:b33cb7f4142e2117d1532d8332200cd6
|
3.1 GB | Download |