EEA Air Quality In-Situ Measurement Station Data
Description
Introduction
This dataset is a value-added product based on 'Up-to-date air quality station measurements', administered by the European Environmental Agency (EEA) and collected by its member states. The original hourly measurement data (NO2, SO2, O3, PM10, PM2.5 in µg/m³) was reshaped, gapfilled and aggregated to different temporal resolutions, making it ready to use in time series analysis or spatial interpolation tasks.
Reproducible code for accessing and processing this data and notebooks for demonstration can be found on Github.
Accessing and pre-processing hourly data
Hourly data was retrieved through the API of the EEA Air Quality Download Service. Measurements (single files per station and pollutant) were joined to create a single time series per station with observations for multiple pollutants. As PM2.5 data is sparse but correlates well with PM10, gapfilling was performed according to methods described in Horálek et al., 2023¹. Validity and verification flags from the original data were passed on for quality filtering. Reproducible computational notebooks using the R programming language are available for the data access and the gapfilling procedure.
Temporal aggregates
Data was aggregated to three coarser temporal resolutions: day, month, and year. Coverage (ratio of non-missing value) was calculated for each pollutant and temporal increment. A threshold of 75% was applied to generate reliable aggregates. All pollutants were aggregated by their aritmethic mean. Additionally, two pollutants were aggregated using a percentile method, which has shown to be more appropriate for mapping applications. PM10 was summarized using the 90.41th percentile. Daily O3 was further summarized as the maximum of the 8-hour running mean. Based thereon, monthly and annual O3 was aggregated using the 93.15th percentile of the daily maxima. For more details refer to the reproducible computational notebook on temporal aggregation.
Data columns
column | hourly | daily | monthly | annual | description |
Air.Quality.Station.EoI.Code | x | x | x | x | Unique station ID |
Countrycode | x | x | x | x | Two-letter ISO country code |
Start | x | Start time of (hourly) measurement period | |||
<Pollutant> | x | x | x | x | One of NO2; SO2; O3; O3_max8h_93.15; PM10; PM10_90.41; PM2.5 in µg/m³ |
Validity_<Pollutant> | x | Validity flag of the respective pollutant | |||
Verification_<Pollutant> | x | Verification flag of the respective pollutant | |||
filled_PM2.5 | x | Flag indicating if PM2.5 value is measured or supplemented through gapfilling (boolean) | |||
year | x | x | x | Year (2015-2023) | |
cov.year_<Pollutant> | x | x | Data coverage throughout the year (0-1) | ||
month | x | x | Month (1-12) | ||
cov.month_<Pollutant> | x | x | Data coverage throughout the month (0-1) | ||
doy | x | Day of year (0-366) | |||
cov.day_<Pollutant> | x | Data coverage throughout the day (0-1) |
Station meta data
The below table lists relevant meta data on the station level, including type and area of measurement stations, as well as their coordinates. It is, hence, static and does not vary over time. These data are directly included in daily, monthly, and annual aggregates. To optimize file size for hourly data, it is stored seperately (in a file named "EEA_stations_meta_table.parquet") and can be joined.
column | description |
Air.Quality.Station.EoI.Code | Unique station ID (required for join) |
Countrycode | Two-letter ISO country code |
Station.Type | One of "background", "industrial", or "traffic" |
Station.Area | One of "urban", "suburban", "rural", "rural-nearcity", "rural-regional", "rural-remote" |
Longitude & Latitude | Geographic coordinates of the station |
Parquet file format
This dataset is shipped in Parquet files. Parquet is a relatively new and very memory-efficient format, that differs from traditional tabular file formats (e.g. CSV) in the sense that it is binary and cannot be opened and displayed by common tabular software (e.g. MS Excel, Libre Office, etc.).
Daily, monthly and annual data files are small (> 200Mb) and stored in a single file each. They are written in GeoParquet format, making them ready to use in e.g. GIS (via download) or cloud environment (via URL).
Hourly data is much larger (3.7Gb) and is therefore partitioned by `Countrycode` (one file per country) to enable reading smaller subsets. Users have to use an Apache Arrow implementation, for example in Python, R, C++, or another scripting language. Reading the data there is straight forward (see the code samples below).
R code:
# required libraries
library(arrow)
library(sf)
# read air quality and meta data
aq = read_parquet("airquality.no2.o3.so2.pm10.pm2p5_4.annual_pnt_20150101_20231231_eu_epsg.3035_v20240718.parquet")
aq = st_as_sf(aq) |> st_set_crs(4326)
Python code: # required library
import geopandas as gpd
# read air quality
aq = gpd.read_parquet("airquality.no2.o3.so2.pm10.pm2p5_4.annual_pnt_20150101_20231231_eu_epsg.3035_v20240718.parquet")
Files
airquality.no2.o3.so2.pm10.pm2p5_1.hourly_pnt_20150101_20231231_eu_epsg.3035_v20240718.zip
Additional details
Software
- Repository URL
- https://github.com/Open-Earth-Monitor/UseCase_AIRCON/tree/WP4_insitu
- Programming language
- R, Python
- Development Status
- Active
References
- Horálek, J., Vlasáková, L., Schreiberová, M., Marková, J., Schneider, P., Kurfürst, P., Tognet, F., Schovánková, J., Vlček, O., Damašková, D., 2022. European air quality maps for 2020. PM10, PM2.5, Ozone, NO2, NOx and Benzo(a)pyrene spatial estimates and their uncertainties. (No. ETC HE Report 2022/12).