Dataset and Surrogate models based on 2D RANS thermal street canyon pollutant dispersion.

Moreira Beltrami, Gabriel; Calafell, Joan; Gonçalves dos Santos, Rogério; Mateu Armengol, Jan

doi:10.5281/zenodo.19091321

Published March 18, 2026 | Version v1

Dataset Open

Dataset and Surrogate models based on 2D RANS thermal street canyon pollutant dispersion.

1. Barcelona Supercomputing Center
2. Universidade Estadual de Campinas (UNICAMP)

Dataset Description

This dataset was generated using Reynolds-Averaged Navier–Stokes (RANS) simulations to model pollutant dispersion in a 2D idealized street canyon. It contains 49,590 samples, divided into training+validation and test subsets. Each sample represents a 64×64 pollutant concentration field, produced as part of a data augmentation study for surrogate modeling.

Data Structure

Concentration fields are stored in NumPy .npy format and can be loaded using numpy.load(file).
The arrays have shape:
- (n_samples, 64, 64) or
- (n_samples, 4096), which are interconvertible via .reshape().
Input parameters associated with each sample are stored in a separate .npy file and correspond positionally (i.e., the i-th row matches the i-th sample in the concentration dataset).
The mask file (in DataCoordinates/Mask.npy) identifies grid points inside building areas, which should be excluded from statistical analyses or visualizations of the street canyon.

Input Parameters

Each sample is defined by 18 input parameters representing physical, turbulence, and emission characteristics. The values are normalized, and can be restored using the inverse_norm_parameters function provided in the code base. The parameters and their physical meaning are:

Sct – Turbulent Schmidt number
Ce1 – k−ε model parameter
Ce2 – k−ε model parameter
Cμ – k−εk model parameter
σk – k−εk model parameter
B – k−ε model parameter
κ – von Kármán constant
Uτ – Friction velocity
y0 – Aerodynamic roughness length
pk – TKE source scaling factor
pϵ – Dissipation rate source scaling factor
Bk – Background concentration offset
pB – Background concentration slope
Q – Emission rate
Qh – Emission source height
Qp – Heat release rate
θ – Solar incidence angle
ΔT – Background temperature increase

Surrogate Models

Two surrogate modeling approaches based on deep learning are trained and evaluated using this dataset:

PCA + MLP: A two-step model combining Principal Component Analysis (PCA) for dimensionality reduction and a Multi-Layer Perceptron (MLP) to predict the modal coefficients.
DCED: A convolutional encoder–decoder architecture designed for structured image prediction.

Both models are optimized using the NSGA-II algorithm and included in the TrainedModels/ directory.

Usage

To demonstrate how to load and evaluate the models using the dataset, we provide two example scripts:

MLPExample.py – loads the MLP surrogate, reconstructs PCA-based fields, and plot a sample.
DCEDExample.py – loads the Unet surrogate and plot a sample.

Each model requires loading associated parameter and coordinate data, as well as the building mask.

For more details on the simulation setup, optimization, and evaluation strategy, please refer to the associated preprint Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon, Beltrami, G., et al. 2025, submitted to computer & fluids.

The authors acknowledge financial support from the AIR-URBAN project (TED2021-130210A-I00/ AEI/10.13039/501100011033/ European Union NextGenerationEU/PRTR).

Files

README.md

Files (1.0 GB)

Name	Size	Download all
CFDDatabase-Models.rar md5:a344e6a45d6535e59d11c542999d4573	1.0 GB	Download
README.md md5:a4e751f2cccf44e2ea7b2c7b9a7c8a47	3.6 kB	Preview Download

Additional details

Other: Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon

Is published in: Publication: Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon (Other)

Submitted: 2025-09-01

Programming language: Python

Assessment of deep-learning strategies for surrogate modeling of pollution dispersion in a thermal street canyon

	All versions	This version
Views	5	5
Downloads	1	1
Data volume	3.6 kB	3.6 kB

Dataset Description

Data Structure

Input Parameters

Surrogate Models

Usage

README.md

Files (1.0 GB)

Identifiers

Related works

Dates

Software

References

Dataset and Surrogate models based on 2D RANS thermal street canyon pollutant dispersion.

Authors/Creators

Description

Dataset Description

Data Structure

Input Parameters

Surrogate Models

Usage

Files

README.md

Files (1.0 GB)

Additional details

Identifiers

Related works

Dates

Software

References