Published December 22, 2025 | Version v1
Dataset Open

CODES Datasets: Challenging Non-Equilibrium Chemistry

Authors/Creators

  • 1. Scuola Normale Superiore

Description

This record contains four synthetic astrochemical datasets generated with KROME, designed for benchmarking surrogate models for stiff, non-equilibrium thermo-chemical ODE systems. The datasets differ in chemical network complexity and in whether external physical parameters are treated as explicit inputs, but share a common numerical setup, sampling strategy, and data format.

 

Included files

  • codes_cloud_data.hdf5

  • codes_cloud_parametric_data.hdf5

  • codes_primordial_data.hdf5

  • codes_primordial_parametric_data.hdf5

 

Physical model and solver

All datasets are generated by solving coupled ordinary differential equations describing the time evolution of chemical species abundances and the gas temperature. Species evolution includes two-body reactions and photo-reactions, while the temperature is evolved self-consistently via heating and cooling processes.

The equations are solved using KROME’s 5th-order LSODES implicit multistep solver, which exploits the sparsity of the Jacobian matrix. Relative and absolute tolerances are fixed at 10⁻⁶ and 10⁻²⁰, respectively.

 

Chemical networks

Primordial datasets (codes_primordial_data.hdf5, codes_primordial_parametric_data.hdf5)

  • 9 species: e⁻, H⁻, H, H⁺, He, He⁺, He²⁺, H₂, H₂⁺

  • 46 reactions, including photo-chemistry, cosmic-ray ionization, and H₂ formation on dust grains

  • Reaction rates primarily from Bovino et al. (2016)

Cloud datasets (codes_cloud_data.hdf5, codes_cloud_parametric_data.hdf5)

  • 37 species, including heavy atoms, molecules, and molecular ions (e.g. C, O, CO, OH, H₂O, HCO⁺, H₃⁺)

  • 287 reactions

  • Reaction rates compiled from Glover & Jappsen and the KIDA, OSU, and UMIST databases

 

External parameters

For the non-parametric datasets (primordial and cloud), the external parameters are fixed:

  • Radiation field intensity: G = G₀

  • Metallicity: Z = Z⊙

For the parametric datasets, the external parameters are varied and provided as explicit inputs:

  • Radiation field intensity: G ∈ [0.1 G₀, 10 G₀]

  • Metallicity: Z ∈ [10⁻³ Z⊙, Z⊙]

In the primordial case, metallicity represents all elements heavier than helium. In the cloud case, it accounts only for elements not explicitly included in the chemical network (e.g. silicates).

In all datasets, the cosmic-ray ionization rate is fixed at ζ_cr = 3 × 10⁻¹⁷ s⁻¹, and the dust-to-gas ratio is f_d = 0.3.

 

Sampling of initial conditions

Initial conditions are generated using Sobol sampling to ensure representative coverage of the high-dimensional parameter space, which ranges from 10 dimensions (primordial) up to 39 dimensions (cloud parametric). Gas density, temperature, species abundances, and (where applicable) external parameters are sampled in log-space within predefined ranges.

After sampling, abundances are adjusted to enforce a 3:7 helium-to-(hydrogen+helium) ratio, and the electron abundance is computed a posteriori to ensure charge neutrality.

 

Time integration and dataset size

Each initial condition is evolved for a total time of 10 kyr. The resulting thermo-chemical trajectories are sampled at 100 logarithmically spaced time points, starting just below 0.1 yr to resolve the stiff early-time dynamics.

The datasets contain:

  • 2048 initial conditions for codes_primordial_data.hdf5

  • 4096 initial conditions for codes_primordial_parametric_data.hdf5

  • 4096 initial conditions for codes_cloud_data.hdf5

  • 8192 initial conditions for codes_cloud_parametric_data.hdf5

 

Intended use

These datasets are released as part of the CODES benchmark and are intended for developing and evaluating surrogate models for high-dimensional, stiff chemical kinetics, both with fixed physical conditions and with explicit parametric dependence on external fields.

Files

Files (539.3 MB)

Name Size Download all
md5:9e332b3a4972b22a2d270110fac1febf
249.0 MB Download
md5:e2de5abf73ee17c2b5bd94175151e855
249.2 MB Download
md5:98314724aa014ee44a826d0a59c9c655
8.2 MB Download
md5:49444172dc5edbb9b43cacb9dd903957
32.8 MB Download

Additional details

Software

Repository URL
https://github.com/robin-janssen/CODES-Benchmark
Programming language
Python
Development Status
Active