ExioML: Global Eco-economic Scope 3 Emission Machine Learning Dataset
Description
🙋♂️ Introduction
ExioML is the first ML-ready benchmark dataset in eco-economic research, designed for global sectoral sustainability analysis. It addresses significant research gaps by leveraging the high-quality, open-source EE-MRIO dataset ExioBase 3.8.2. ExioML covers 163 sectors across 49 regions from 1995 to 2022, overcoming data inaccessibility issues. The dataset includes both factor accounting in tabular format and footprint networks in graph structure.
We demonstrate a GHG emission regression task using a factor accounting table, comparing the performance of shallow and deep models. The results show a low Mean Squared Error (MSE), quantifying sectoral GHG emissions in terms of value-added, employment, and energy consumption, validating the dataset's usability. The footprint network in ExioML, inherent in the multi-dimensional MRIO framework, enables tracking resource flow between international sectors.
ExioML offers promising research opportunities, such as predicting embodied emissions through international trade, estimating regional sustainability transitions, and analyzing the topological changes in global trading networks over time. It reduces barriers and intensive data pre-processing for ML researchers, facilitates the integration of ML and eco-economic research, and provides new perspectives for sound climate policy and global sustainable development.
📊 Dataset
ExioML supports graph and tabular structure learning algorithms through the Footprint Network and Factor Accounting table. The dataset includes the following factors in PxP and IxI:
- Region (Categorical feature)
- Sector (Categorical feature)
- Value Added [M.EUR] (Numerical feature)
- Employment [1000 p.] (Numerical feature)
- GHG emissions [kg CO2 eq.] (Numerical feature)
- Energy Carrier Net Total [TJ] (Numerical feature)
- Year (Numerical feature)
☁️ Factor Accounting
The Factor Accounting table shares common features with the Footprint Network and summarizes the total heterogeneous characteristics of various sectors.
🚞 Footprint Network
The Footprint Network models the high-dimensional global trading network, capturing its economic, social, and environmental impacts. This network is structured as a directed graph, where directionality represents sectoral input-output relationships, delineating sectors by their roles as sources (exporting) and targets (importing). The basic element in the ExioML Footprint Network is international trade across different sectors with features such as value-added, emission amount, and energy input. The Footprint Network helps identify critical sectors and paths for sustainability management and optimization. The Footprint Network is hosted on Zenodo.
🔗 Code and Data Availability
The ExioML development toolkit in Python and the regression model used for validation are available on the GitHub repository: (https://github.com/YVNMINC/ExioML). The complete ExioML dataset is hosted by Zenodo: (https://zenodo.org/records/10604610).
💡 Additional Information
More details about the dataset are available in our paper: *ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability*, accepted by the ICLR 2024 Climate Change AI workshop: (https://arxiv.org/abs/2406.09046).
📄 Citation
@inproceedings{guo2024exioml,
title={ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability},
author={Yanming, Guo and Jin, Ma},
booktitle={ICLR 2024 Workshop on Tackling Climate Change with Machine Learning},
year={2024}
}
🌟 Reference
Stadler, Konstantin, et al. "EXIOBASE 3." Zenodo. Retrieved March 22 (2021): 2023.
Files
ExioML_factor_accounting_IxI.csv
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.3583070 (DOI)
Software
- Repository URL
- https://github.com/Yvnminc/ExioML
References
- EXIOBASE 3 (Stadler et al. 2018 DOI: 10.5281/zenodo.3583070)