Synthetic Dataset for Sequential Learning-Based Optimisation of Bio-Ash Binder Formulations under Seasonal Availability Constraints
Authors/Creators
Description
Note: This is Version 2 of the dataset. Following peer review, the scoring function has been revised and is now fully documented in the accompanying README file. Version 1 should be considered deprecated.
This dataset supports research into sequential learning-based optimisation of bio-ash cement binder formulations under seasonally varying material availability. It provides a synthetic but chemically grounded benchmark space for evaluating data-driven optimisation methods in cement materials science.
The dataset contains 5,006 distinct binder formulations, each characterised by mass fractions of cement and five bio-ash components (A1-A5), representing generic agricultural residue derivatives. Cement content varies from 0-100 wt% in discrete steps. Synthetic compressive strength values are derived from a nonlinear scoring function based on chemical descriptors including oxide ratios, hydraulic and pozzolanic proximity metrics, and interaction terms, with seeded stochastic variability to simulate material heterogeneity and measurement scatter.
Seasonal availability is represented by 25 independently seeded scenarios, each covering four seasons (S1-S4), with per-ash supply drawn from a uniform distribution between 3 and 18 kt. The ash usage metric expresses the fraction of the seasonal ash pool consumed at maximum production capacity given the scarcest ingredient constraint.
Files:
- synth_AshBlend_benchmark_SL.csv: SL-ready dataset (primary)
- synth_AshBlend_benchmark_full.csv: full dataset with all descriptors
- README_v2.md: complete methodology documentation