Published May 30, 2026 | Version 1.0.0
Preprint Open

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

  • 1. ROR icon University of Kent

Description

This record accompanies the preprint "Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation" by Karan Sehgal and Khawar Naveed Bhatti (2026).

It provides the synthetic ESG validation benchmark and deterministic generation script described in the paper. The benchmark models heterogeneous Scope 1-3 disclosure records with injected governance failure modes (provenance conflict, stale reporting, climate mismatch, null inflation, transition divergence, audit inconsistency). Marginal distributions, anomaly prevalence (4.7%), and missingness structure (12.3%) are calibrated against publicly reported characteristics of the GHG Protocol, PCAF, and ISSB reporting standards.

The benchmark is released as a deterministic generator rather than as a frozen CSV dump. A single seed-controlled script produces the dataset bit-for-bit identically across runs and machines, together with a SHA-256-hashed manifest. Researchers can regenerate at any size from a few hundred records to the full ~68,000 used in the paper.

Contents: generate_benchmark.py (generator), README.md (schema, failure modes, reproducibility instructions), LICENSE (CC BY 4.0), and the companion paper PDF.

Materials are released to support reproducible research into provenance-aware ESG validation, imbalance-aware anomaly detection, and governance-oriented auditability under fragmented Scope 1-3 reporting conditions.

Files

zenodo_esg_deposit.zip

Files (7.7 MB)

Name Size Download all
md5:a905c820b03947d7abaf61f5021722fe
7.7 MB Preview Download

Additional details

Dates

Submitted
2026-05-23
Submitted to arXiv; currently under moderation review.

Software

Programming language
Python
Development Status
Active