Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation
Authors/Creators
Description
This record accompanies the preprint "Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation" by Karan Sehgal and Khawar Naveed Bhatti (2026).
It provides the synthetic ESG validation benchmark and deterministic generation script described in the paper. The benchmark models heterogeneous Scope 1-3 disclosure records with injected governance failure modes (provenance conflict, stale reporting, climate mismatch, null inflation, transition divergence, audit inconsistency). Marginal distributions, anomaly prevalence (4.7%), and missingness structure (12.3%) are calibrated against publicly reported characteristics of the GHG Protocol, PCAF, and ISSB reporting standards.
The benchmark is released as a deterministic generator rather than as a frozen CSV dump. A single seed-controlled script produces the dataset bit-for-bit identically across runs and machines, together with a SHA-256-hashed manifest. Researchers can regenerate at any size from a few hundred records to the full ~68,000 used in the paper.
Contents: generate_benchmark.py (generator), README.md (schema, failure modes, reproducibility instructions), LICENSE (CC BY 4.0), and the companion paper PDF.
Materials are released to support reproducible research into provenance-aware ESG validation, imbalance-aware anomaly detection, and governance-oriented auditability under fragmented Scope 1-3 reporting conditions.
Files
zenodo_esg_deposit.zip
Files
(7.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a905c820b03947d7abaf61f5021722fe
|
7.7 MB | Preview Download |
Additional details
Dates
- Submitted
-
2026-05-23Submitted to arXiv; currently under moderation review.