Published June 6, 2024 | Version 1.0.0
Dataset Open

Met Office UKCP Local CPM precipitation ML emulator dataset


Met Office UKCP Local CPM precipitation ML emulator dataset


This is a collection of two datasets: one sourced from CPM data (bham_gcmx-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season.tar.gz) and one sourced from GCM data (bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season.tar.gz). Each dataset is made up of climate model variables extracted from the Met Office's storage system, combining many variables over many years. It consists of 3 NetCDF files (, and, a YML ds-config.yml file and a README (similar to this one but tailored to the source of the data). Code used to create the dataset can be found here: (specifically the v0.1.0 tag:


The YML file contains the configuration for the creation of the dataset, including the variables, scenario, ensemble members, spatial domain and resolution, and the scheme for splitting the data across the three subsets.


Each NetCDF contains the same variables but split into different subsets (train, val and test) of the based on time dimension.

Otherwise the NetCDF files have the sames dimensions and coordinates for ensemble_member, grid_longitude and grid_latitude.

  • Spatial resolution: This has two parts - the resolution of the data and the grid resolution stored at in the file. For predictand variables this is 2.2km variables coarsened 4 times to 8.8km (this is the target grid). For predictor variables this is 2.2km variables conservatively regriddded to GCM 60km grid or variables from GCM (so already on 60km grid) then regrid (nearest neighbour) to the target grid of predictands. In the naming convention of resolution used in config files, 60km resolution is synonamous with the GCM grid and 2.2km resolution is synonamous with the CPM grid.
  • Spatial domain: A 64x64 section of the 8.8km target grid covering England and Wales
  • Time resolution: daily
  • Time domain: 1st Dec 1980 to 30th Nov 2000; 1st Dec 2020 to 30th Nov 2040; 1st Dec 2060 to 30th Nov 2080. Uses a 360-day calendar.
  • Scenario: RCP8.5
  • Ensemble Members: 01, 04-13 & 15 (these correspond to the 12 ensemble member runs from the CPM but don't carry intrinsic meaning).
  • Split scheme: 70% training, 15% validation, 15% testing, split by choosing complete seasons at random, with an equal number of each season from each of the 3 time periods.


Predictor variables

  •  psl (hPa) - mean sea level pressure
  • temp850, temp700, temp500, temp250 - air temperature (K) at 850, 700, 500 and 250 hPa
  • vorticity850, vorticity700, vorticity500, vorticity250 - relative vorticity (s^-1) at 850, 700, 500 and 250 hPa
  • spechum850, spechum700, spechum500, spechum250 - specific humidity at 850, 700, 500 and 250 hPa


Predictand variable

  • target_pr - precipitation rate (mm/day)