Published August 1, 2024 | Version v1
Dataset Open

PPMLES – Perturbed-Parameter ensemble of MUST Large-Eddy Simulations

  • 1. ROR icon Climat, Environnement, Couplages et Incertitudes
  • 2. ROR icon Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique
  • 3. ROR icon Centre National de la Recherche Scientifique
  • 4. ROR icon Laboratoire d'Analyse et d'Architecture des Systèmes

Description

Dataset description

This repository contains the PPMLES (Perturbed-Parameter ensemble of MUST Large-Eddy Simulations) dataset, which corresponds to the main outputs of 200 large-eddy simulations (LES) of microscale pollutant dispersion that replicate the MUST field experiment [Biltoft. 2001, Yee and Biltoft. 2004] for varying meteorological forcing parameters.

The goal of the PPMLES dataset is to provide a comprehensive dataset to better understand the complex interactions between the atmospheric boundary layer (ABL), the urban environment, and pollutant dispersion. It was originally used to assess the impact of the meteorological uncertainty on microscale pollutant prediction and to build a surrogate model that can replace the costly LES model [Lumet et al. 2025]. The total computational cost of the PPMLES dataset is estimated to be about 6 million core hours.

For each sample of meteorological forcing parameters (inlet wind direction and friction velocity), the AVBP solver code [Schonfeld and Rudgyard. 1999, Gicquel et al. 2011] was used to perform LES at very high spatio-temporal resolution (1e-3s time step, 30cm discretization length) to provide a fine representation of the pollutant concentration and wind velocity statistics within the urban-like canopy. The total computational cost of the PPMLES dataset is estimated to be about 6 million core hours.

File list

The data is stored in HDF5 files, which can be efficiently processed in Python using the h5py module. 

  • input_parameters.h5: list of the 200 input parameter samples (alpha_inlet, ustar) obtained using the Halton sequence that defines the PPMLES ensemble.
  • ave_fields.h5: lists of the main field statistics predicted by each of the 200 LES samples over the 200-s reference window [Yee and Biltoft. 2004], including:
    • c: the time-averaged pollutant concentration in ppmv (dim = (n_samples, n_nodes) = (200, 1878585))
    • (u, v, w): the time-averaged wind velocity components in m/s,
    • crms: the root mean square concentration fluctuations in ppmv, 
    • tke: the turbulent kinetic energy in m^2/s^2,
    • (uprim_cprim, vprim_cprim, wprim_cprim): the pollutant turbulent transport components
  • uncertainty.h5: lists of the estimated aleatory uncertainty induced by the internal variability of the LES (variability_#) [Lumet et al. 2024] for each of the fields in ave_fields.h5. Also includes the stationary bootstrap [Politis and Romano. 1994] parameters (n_replicates, block_length) used to estimate the uncertainty for each field and each sample.
  • mesh.h5: the tetrahedral mesh on which the fields are discretized, composed of about 1.8 millions of nodes.
  • time_series.h5: HDF5 file consisting of 200 groups (Sample_NNN) each containing the time series of the pollutant concentration (c) and wind velocity components (u, v, w) predicted by the LES sample #NNN at 93 locations. 
  • probe_network.dat: provides the location of each of the 93 probes corresponding to the positions of the experimental campaign sensors [Biltoft. 2001].

Warning: the propylene concentration are expressed in ppmv, except in time_series.h5 in which they are given as mass fractions. To convert them in ppmv, the formula is: c = c * (rho/rho_propylene) * 10**6 with (rho/rho_propylene) = 0.66 the density ratio between air and propylene.

Code examples

In the following, examples of how to use the PPMLES dataset in Python are provided. These examples have the following dependencies: 

requires-python = ">=3.9"
dependencies = [
    "h5py==3.8.0",
    "numpy==1.26.4",
    "scipy",
]

A) Dataset reading

### Imports
import h5py
import numpy as np
### Load the input parameters list into a numpy array (shape = (200, 2))
inputf = h5py.File('PPMLES/input_parameters.h5', 'r')
input_parameters = np.array((inputf['alpha_inlet'], inputf['friction_velocity'])).T
### Load the domain mesh node coordinates
meshf = h5py.File('../PPMLES/mesh.h5', 'r')
mesh_nodes = np.array((meshf['Nodes']['x'], meshf['Nodes']['y'], meshf['Nodes']['z'])).T  ### Load the set of time-averaged LES fields and their associated uncertainty var = 'c' # Can be: 'c', 'u', 'v', 'w', 'crms', 'tke', 'uprim_cprim', 'vprim_cprim', or 'wprim_cprim' fieldsf = h5py.File('PPMLES/ave_fields.h5', 'r') fields_list = fieldsf[var] uncertaintyf = h5py.File('PPMLES/uncertainty_ave_fields.h5', 'r') uncertainty_list = uncertaintyf[var] ### Time series reading example timeseriesf = h5py.File('PPMLES/time_series.h5', 'r') var = 'c' # Can be: 'c', 'u', 'v', or 'w' probe = 32 # Integer between 0 and 92, see probe_network.csv time_list = [] time_series_list = [] for i in range(200): time_list.append(np.array(timeseriesf[f'Sample_{i+1:03}']['time'])) time_series_list.append(np.array(timeseriesf[f'Sample_{i+1:03}'][var][probe]))

B) Interpolation of one-field from the unstructured grid to a new structured grid

### Imports
import h5py
import numpy as np
from scipy.interpolate import griddata
### Load the mean concentration field sample #028
fieldsf = h5py.File('PPMLES/ave_fields.h5', 'r')
c = fieldsf['c'][27]
### Load the unstructured grid
meshf = h5py.File('PPMLES/mesh.h5', 'r')
unstructured_nodes = np.array((meshf['Nodes']['x'], meshf['Nodes']['y'], meshf['Nodes']['z'])).T
### Structured grid definition
x0, y0, z0 = -16.9, -115.7, 0.
lx, ly, lz = 205.5, 232.1, 20.
resolution = 0.75
x_grid, y_grid, z_grid = np.meshgrid(np.linspace(x0, x0 + lx, int(lx/resolution)), 
                                     np.linspace(y0, y0 + ly, int(ly/resolution)),
                                     np.linspace(z0, z0 + lz, int(lz/resolution)),
                                     indexing='ij')
### Interpolation of the field on the new grid
c_interpolated = griddata(unstructured_nodes, c, 
                          (x_grid.flatten(), y_grid.flatten(), z_grid.flatten()), 
                          method='nearest')

C) Expression of all time series over the same time window with the same time discretization

### Imports
import h5py
import numpy as np
from scipy.interpolate import griddata
### Define a common time discretization over the 200-s analysis period
common_time = np.arange(0., 200., 0.05)
u_series_list = np.zeros((200, np.shape(common_time)[0]))
### Interpolate the u-compnent velocity time series at probe DPID10 over this time discretization
timeseriesf = h5py.File('PPMLES/time_series.h5', 'r')
for i in range(200):
    sample_time = np.array(timeseriesf[f'Sample_{i+1:03}']['time']) - \
                  np.array(timeseriesf[f'Sample_{i+1:03}']['Parameters']['t_spinup'])  # Offset the spinup time
    u_series_list[i] = griddata(sample_time, timeseriesf[f'Sample_{i+1:03}']['u'][9], common_time, method='linear')

D) Surrogate model construction example

The training and validation of a POD-GPR surrogate model [Marrel et al. 2015] learning from the PPMLES dataset is given in the following GitHub repository. This surrogate model was successfully used by Lumet et al. 2025 to emulate the LES mean concentration prediction for varying meteorological forcing parameters.

Acknowledgments

This work was granted access to the HPC resources from GENCI-TGCC/CINES (A0062A10822, project 2020-2022). The authors would like to thank Olivier Vermorel for the preliminary development of the LES model, and Simon Lacroix for his proofreading.

Files

probe_network.csv

Files (36.5 GB)

Name Size Download all
md5:653a45b23fc07810035510223a92164e
17.1 GB Download
md5:0980fc48dd8fe5b1bddb8ecec9cb771e
6.8 kB Download
md5:deead1a3da03e4109c4bca1b65be5a2f
387.1 MB Download
md5:3b392ca54a21fd338938de90df403f7e
2.9 kB Preview Download
md5:1e4d5bc693f02cb29979aad67aae65f6
3.1 GB Download
md5:4c02660ef159d244fbaf71358b1ca0cb
15.9 GB Download

Additional details

References

  • Biltoft, C. (2001). Customer report for Mock Urban Setting Test. DPG Document No. WDTC-FR-01-121, West Desert Test Center, U.S. Army Dugway Proving Ground, Utah, USA. URL: https://my.eng.utah.edu/~pardyjak/documents/MUSTCustReport.pdf (Accessed: 2024-06-28)
  • Gicquel, L. Y., Gourdain, N., Boussuge, J.-F., Deniau, H., Staffelbach, G., Wolf, P., and Poinsot, T. (2011). High performance parallel computing of flows in complex geometries. Comptes Rendus Mécanique, 339(2):104–124. ISSN 1631-0721. DOI: https://doi.org/10.1016/j.crme.2010.11.006
  • Lumet, E., Jaravel, T., Rochoux, M. C., Vermorel, O., and Lacroix, S. (2024). Assessing the Internal Variability of Large-Eddy Simulations for Microscale Pollutant Dispersion Prediction in an Idealized Urban Environment. Boundary-Layer Meteorology, 190(2):9. ISSN 1573-1472. DOI: https://doi.org/10.1007/s10546-023-00853-7
  • Lumet, E., Rochoux, M. C., Jaravel, T., and Lacroix, S. (2025). Uncertainty-Aware Surrogate Modeling for Urban Air Pollutant Dispersion Prediction. Building and Environment, page 112287. DOI: https://doi.org/10.1016/j.buildenv.2024.112287
  • Marrel, A., Perot, N., and Mottet, C. (2015). Development of a surrogate model and sensitivity analysis for spatio-temporal numerical simulators. Stochastic Environmental Research and Risk Assessment, 29(3):959–974. ISSN 1436-3259. DOI: https://doi.org/10.1007/s00477-014-0927-y
  • Politis, D. N. and Romano, J. P. (1994). The stationary bootstrap. J. Am. Stat. Assoc., 89(428):1303–1313. DOI: https://doi.org/10.1080/01621459.1994.10476870
  • Schonfeld, T. and Rudgyard, M. (1999). Steady and unsteady flow simulations using the hybrid flow solver AVBP. AIAA journal, 37(11):1378–1385. DOI: https://doi.org/10.2514/2.636
  • Yee, E. and Biltoft, C. (2004). Concentration fluctuation measurements in a plume dispersing through a regular array of obstacles. Boundary-Layer Meteorology, 111(3):363–415. DOI: https://doi.org/10.1023/B:BOUN.0000016496.83909.ee