Published June 25, 2025 | Version v1
Presentation Open

Boosting Air Quality Downscaling with Extreme-Value-Sensitive Strategies

Description

Presentation presented at the Living Planet Symposium 2025 Conference.

Mining industries must monitor air pollutant emissions to comply with EU directives, protect public health and the environment, and support broader sustainability goals. This involves using direct on-site measurements, drones, and mobile units. With their broad coverage and pollutant-specific detection capabilities, spaceborne sensors provide a complementary tool for tracking emissions over large areas, ensuring regulatory compliance, and improving environmental management. However, the spaceborne estimates represent only the total atmospheric column and may not provide comprehensive information about surface concentrations. Therefore, data fusion from multiple sources, including ground stations, satellites, and CTMs based on machine learning (ML) methods, has seen significant advancements in estimating ground-level air pollutant concentrations. ML methods like Random Forests (RF), support vector regression, feed-forward neural networks, and eXtreme Gradient Boosting (XGBoost) are widely used to predict high-resolution ground-level air pollutant concentrations by using data from ground-based measurements, assimilated records (e.g., meteorological variables), and satellite observations (e.g., Sentinel-5P TROPOMI). To enable end-to-end predictions, these downscaling models establish relationships between inputs (e.g., NO₂, CO column density) and outputs (e.g., ground-based measurements of air pollutant concentrations). These ML-based downscaling methods usually outperform classical spatial interpolation and statistical regression methods; however, they struggle to estimate extreme events since their representative values are located in the tails or even outside of the training distribution. Therefore, it is important to identify ML models and training strategies capable of addressing the highly imbalanced data distribution and extending the magnitude of their prediction range to capture hotspots of extreme air pollutant emissions. The EU-funded TERRAVISION project addresses this gap by introducing a novel framework that utilises ensemble techniques that combine multiple ML models; strengths and training strategies for effectively handling imbalanced datasets. Techniques such as oversampling the high-concentration samples or incorporating appropriate loss functions that penalise prediction errors for extreme values could help alleviate the challenges posed by the imbalanced distribution. In addition to these strategies, evaluation metrics, like Geometric Mean (GM) and Squared Error Relevance Area – SERA should be employed to accurately measure how well the ML models capture extreme values and get a complete understanding of their downscaling capabilities. This framework incorporates an ablation study involving variations of ML models, oversampling techniques, cost-sensitive learning where data points are weighted according to their target value rarities, loss functions that perform asymmetric optimisation and highlight extreme values, and evaluation metrics to assess the individual contributions of each component. The models were trained on benchmark datasets, including ground-based measurements from the European Environmental Agency (EEA) air quality monitoring station network, Sentinel-5P tropospheric vertical column density values and modelled meteorological data obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 -Land (ERA5-Land) model to estimate the near-surface concentrations of air pollutants at 1 km spatial resolution. The trained models were benchmarked against state-of-the-art ML models to downscale air pollutant concentrations. Evaluations were performed on the entire test dataset as well as on three disjoint subsets based on the number of samples that correspond to specific ranges of near-surface concentration values. Extensive experiments verify the superior performance of the proposed strategies towards addressing the underestimation of extreme ground-level concentrations on air quality downscaling tasks. By incorporating these improvements into our modelling framework, we can overcome existing limitations and improve the accuracy and reliability of air quality predictions. This will ultimately benefit environmental monitoring and decision-making processes.

Files

Files (1.3 MB)

Name Size Download all
md5:dc310ca6f3a0bbdab0f850dfd266f57b
16.1 kB Download
md5:63ef836fb2ed8845e936900f793852c7
1.3 MB Download

Additional details

Funding

European Commission
TERRAVISION - TERRAVISION: Integrated Earth Observation based platform for novel services to enhance raw materials mining life cycle 101138643