Dataset and Surrogate models based on 2D RANS thermal street canyon pollutant dispersion.
Authors/Creators
Description
Dataset Description
This dataset was generated using Reynolds-Averaged Navier–Stokes (RANS) simulations to model pollutant dispersion in a 2D idealized street canyon. It contains 49,590 samples, divided into training+validation and test subsets. Each sample represents a 64×64 pollutant concentration field, produced as part of a data augmentation study for surrogate modeling.
Data Structure
-
Concentration fields are stored in NumPy
.npyformat and can be loaded usingnumpy.load(file).
The arrays have shape:-
(n_samples, 64, 64)or -
(n_samples, 4096), which are interconvertible via.reshape().
-
-
Input parameters associated with each sample are stored in a separate
.npyfile and correspond positionally (i.e., the i-th row matches the i-th sample in the concentration dataset). -
The mask file (in
DataCoordinates/Mask.npy) identifies grid points inside building areas, which should be excluded from statistical analyses or visualizations of the street canyon.
Input Parameters
Each sample is defined by 18 input parameters representing physical, turbulence, and emission characteristics. The values are normalized, and can be restored using the inverse_norm_parameters function provided in the code base. The parameters and their physical meaning are:
-
Sct – Turbulent Schmidt number
-
Ce1 – k−ε model parameter
-
Ce2 – k−ε model parameter
-
Cμ – k−εk model parameter
-
σk – k−εk model parameter
-
B – k−ε model parameter
-
κ – von Kármán constant
-
Uτ – Friction velocity
-
y0 – Aerodynamic roughness length
-
pk – TKE source scaling factor
-
pϵ – Dissipation rate source scaling factor
-
Bk – Background concentration offset
-
pB – Background concentration slope
-
Q – Emission rate
-
Qh – Emission source height
-
Qp – Heat release rate
-
θ – Solar incidence angle
-
ΔT – Background temperature increase
Surrogate Models
Two surrogate modeling approaches based on deep learning are trained and evaluated using this dataset:
-
PCA + MLP: A two-step model combining Principal Component Analysis (PCA) for dimensionality reduction and a Multi-Layer Perceptron (MLP) to predict the modal coefficients.
-
DCED: A convolutional encoder–decoder architecture designed for structured image prediction.
Both models are optimized using the NSGA-II algorithm and included in the TrainedModels/ directory.
Usage
To demonstrate how to load and evaluate the models using the dataset, we provide two example scripts:
-
MLPExample.py– loads the MLP surrogate, reconstructs PCA-based fields, and plot a sample. -
DCEDExample.py– loads the Unet surrogate and plot a sample.
Each model requires loading associated parameter and coordinate data, as well as the building mask.
For more details on the simulation setup, optimization, and evaluation strategy, please refer to the associated preprint Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon, Beltrami, G., et al. 2025, submitted to computer & fluids.
The authors acknowledge financial support from the AIR-URBAN project (TED2021-130210A-I00/ AEI/10.13039/501100011033/ European Union NextGenerationEU/PRTR).
Files
README.md
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a344e6a45d6535e59d11c542999d4573
|
1.0 GB | Download |
|
md5:a4e751f2cccf44e2ea7b2c7b9a7c8a47
|
3.6 kB | Preview Download |
Additional details
Identifiers
- Other
- Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon
Related works
- Is published in
- Publication: Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon (Other)
Dates
- Submitted
-
2025-09-01
Software
- Programming language
- Python
References
- Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon