Published March 18, 2026 | Version v1
Dataset Open

Dataset and Surrogate models based on 2D RANS thermal street canyon pollutant dispersion.

  • 1. ROR icon Barcelona Supercomputing Center
  • 2. ROR icon Universidade Estadual de Campinas (UNICAMP)

Description

Dataset Description

This dataset was generated using Reynolds-Averaged Navier–Stokes (RANS) simulations to model pollutant dispersion in a 2D idealized street canyon. It contains 49,590 samples, divided into training+validation and test subsets. Each sample represents a 64×64 pollutant concentration field, produced as part of a data augmentation study for surrogate modeling.

Data Structure

  • Concentration fields are stored in NumPy .npy format and can be loaded using numpy.load(file).
    The arrays have shape:

    • (n_samples, 64, 64) or

    • (n_samples, 4096), which are interconvertible via .reshape().

  • Input parameters associated with each sample are stored in a separate .npy file and correspond positionally (i.e., the i-th row matches the i-th sample in the concentration dataset).

  • The mask file (in DataCoordinates/Mask.npy) identifies grid points inside building areas, which should be excluded from statistical analyses or visualizations of the street canyon.

Input Parameters

Each sample is defined by 18 input parameters representing physical, turbulence, and emission characteristics. The values are normalized, and can be restored using the inverse_norm_parameters function provided in the code base. The parameters and their physical meaning are:

  1. Sct – Turbulent Schmidt number

  2. Ce1 – k−ε model parameter

  3. Ce2 – k−ε model parameter

  4. – k−εk model parameter

  5. σk – k−εk model parameter

  6. B – k−ε model parameter

  7. κ – von Kármán constant

  8. – Friction velocity

  9. y0 – Aerodynamic roughness length

  10. pk – TKE source scaling factor

  11. – Dissipation rate source scaling factor

  12. Bk – Background concentration offset

  13. pB – Background concentration slope

  14. Q – Emission rate

  15. Qh – Emission source height

  16. Qp – Heat release rate

  17. θ – Solar incidence angle

  18. ΔT – Background temperature increase

Surrogate Models

Two surrogate modeling approaches based on deep learning are trained and evaluated using this dataset:

  • PCA + MLP: A two-step model combining Principal Component Analysis (PCA) for dimensionality reduction and a Multi-Layer Perceptron (MLP) to predict the modal coefficients.

  • DCED: A convolutional encoder–decoder architecture designed for structured image prediction.

Both models are optimized using the NSGA-II algorithm and included in the TrainedModels/ directory.

Usage

To demonstrate how to load and evaluate the models using the dataset, we provide two example scripts:

  • MLPExample.py – loads the MLP surrogate, reconstructs PCA-based fields, and plot a sample.

  • DCEDExample.py – loads the Unet surrogate and plot a sample.

Each model requires loading associated parameter and coordinate data, as well as the building mask.

For more details on the simulation setup, optimization, and evaluation strategy, please refer to the associated preprint Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon, Beltrami, G., et al. 2025, submitted to computer & fluids.

The authors acknowledge financial support from the AIR-URBAN project (TED2021-130210A-I00/ AEI/10.13039/501100011033/ European Union NextGenerationEU/PRTR).

Files

README.md

Files (1.0 GB)

Name Size Download all
md5:a344e6a45d6535e59d11c542999d4573
1.0 GB Download
md5:a4e751f2cccf44e2ea7b2c7b9a7c8a47
3.6 kB Preview Download

Additional details

Identifiers

Other
Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon

Related works

Is published in
Publication: Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon (Other)

Dates

Submitted
2025-09-01

Software

Programming language
Python

References

  • Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon