Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.

There is a newer version of the record available.

Published January 14, 2021 | Version 1.1
Dataset Open

Britain Breathing 2016-2019 Air Quality and Meteorological Regional Estimates Dataset

  • 1. University of Manchester
  • 2. Chinese University of Hong Kong

Description

This data set is a collection of estimated daily mean and maximum values for a range of air quality and meterological measurements and model forecasts for UK postcode districts (e.g. 'AB') for the years 2016-2019, inclusive.

The data uses a 'concentric regions' method to estimate the measurement for all regions, as follows. If measurements exist within the region, the mean of those measurements is used, if not, then a ring of neighbouring postcode regions are selected, and the mean of their measurement values used. If no measurement sites/data are found in the first ring, the process continues, taking the next ring of postcode district regions, working outwards until one or more sensors are found in a ring.  As well as the measurement estimations, the number of rings required to find site data and make the estimations is also published. As a result, please note that estimations with higher ring counts ('rings') are likely to be calculated from more distant sensors. This distance depends upon the size of the postcode regions surrounding the location being estimated. Please use the ring count ('rings') to limit/filter estimations based on your required level of confidence.

The meteorological, pollen and air quality measurement data used to make the regional estimations can be found at this Zenodo archive.  The data there contains Temperature, Relative Humidity, and Pressure data, downloaded from the Met Office MIDAS archives via the MEDMI server (https://www.data-mashup.org.uk/). Also downloaded from the MEDMI server are daily pollen measurements for the UK. PM10, PM2.5, NO2, NOx (as NO2), O3, and SO2 measurements from the DEFRA AURN network, and also model forecasts of the same made using the EMEP model.

The code used to make the estimations is available at this Zenodo archive.

The postcode data in postcode_district_data.csv are collated from several sources: 

The data-set is presented in CSV format, as six files:

  1. postcode_district_data.csv: location metadata (region_id, geometry, description, population, country)
  2. regional_site_counts.csv: a table showing the number of sites for each measurement (columns), for each region_id (rows). region_id's match those in the postcode_district_data.csv file.
  3. turing_regional_estimates_aq_daily_met_pollen_pollution_imputed_data.csv: uses imputed site data (timestamp, region_id, ...[measurement name, rings]) ('rings' is the number of rings required to make the estimation)
  4. turing_regional_estimates_aq_daily_met_pollen_pollution_original_data.csv: uses original site data (timestamp, region_id, ...[measurement name, rings]) ('rings' is the number of rings required to make the estimation)
  5. turing_regional_estimates_aq_loc_type_daily_imputed_data.csv: uses imputed site data. Air quality regional estimates are calculated using specific AQ site location types* separately. (To prevent, for example, 'Traffic Urban' type sites being used to estimate 'non-traffic' or rural regions.)
  6. turing_regional_estimates_aq_loc_type_daily_original_data.csv: uses original data. Air quality regional estimates are calculated using specific AQ site location types* separately. (To prevent, for example, 'Traffic Urban' type sites being used to estimate 'non-traffic' or rural regions.)

* Air quality site types: 

  • Industrial: comprises 'urban industrial' (9 sites) and suburban industrial (2 sites)
  • 'Rural background' (14 sites)
  • 'Urban background' (48 sites)
  • 'Urban traffic' (47 sites)

Files

postcode_district_data.csv