Finding the right XAI Method --- Dataset

Philine Lou Bommer; Marlene Kretschmer; Anna Hedström; Dilyara Bareeva; Marina M.-C. Höhne

doi:10.5281/zenodo.7715398

Published March 10, 2023 | Version 0.0.1

Dataset Open

Finding the right XAI Method --- Dataset

1. TU Berlin; ATB Potsdam
2. Institute for Meteorology, University of Leipzig, Leipzig, Germany; Department of Meteorology, University of Reading, Reading, UK
3. ATB Potsdam; Institute of Computer Science - University of Potsdam, Potsdam, Germany; BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany; Machine Learning Group, UiT the Arctic University of Norway, Tromsø, Norway

This dataset provides the complementary preprocessed data for the training of the neural networks used in Bommer et. al. and according source code (https://github.com/philine-bommer/Climate_X_Quantus). In the publication , we introduce XAI evaluation in the context of climate research and assess different desired explanation properties, namely, robustness, faithfulness, randomization, complexity, and localization. To this end we build upon previous work (Labe and Barnes et. al. 2021) and
train a multi-layer perceptron (MLP) and a convolutional neural network (CNN) to predict the decade based on annual-mean temperature maps.

Following Labe and Barnes et. al. 2021, we use data simulated by the general climate model, CESM1 (Hurrell et. al. 2013). We use the global 2-m air temperature (T2m) temperature maps from 1920 to 2080. The data consist of 40 ensemble members and each member is generated by varying the atmospheric initial conditions with fixed external forcing, i.e. historical forcings are imposed from 1920 to 2005 and Representative Concentration Pathways 8.5 for the following years (Kay et. al. 2015).
Following Labe and Barnes et. al. 2021, we compute annual averages and apply a bilinear interpolation. This results in T=161 temperature maps for each member, with v=144 longitude grid cells and h=95 latitude grid cells, given the 1.9° sampling in latitude and 2.5° sampling in longitude. The temperature maps are finally standardized by removing the multi-year (1920 to 2080) mean and subsequently dividing by the corresponding standard deviation.

Unlike the flattened input used for the MLP (temperature maps are flattened into a vector), the CNN maintains the longitude-latitude grid of the temperature maps. Similar to Labe and Barnes et. al. 2021, for training, validation and testing we use the model data discussed above. For both MLP and CNN we consider 20% of the data as test set and the remaining 80% is split into a training (64%) and validation (16%) set. We train both networks to solve a fuzzy classification problem which combines classification and regression. In the classification setting, the network assigns each map to one of the 20 different classes, where each class corresponds to one decade between 1900 and 2100 (necessary class devision for later regression, as done by Labe and Barnes et. al. 2021). The network output thus, is a probability vector containing a probability for each class.

To assess the network performance we use the monthly 2m air temperature of the 20th century Reanalysis data (V3) (Slivinski et. al. 2019) from 1920 to 2015.

The dataset includes two compressed .npz-files and a Readme.md. A full description of the data contained in this dataset and instructions on the data usage are provided in the Readme-file.

Notes

This work was funded by the German Ministry for Education and Research through project Explaining 4.0 (ref. 01IS200551). The authors also acknowledge the CESM Large Ensemble Community Project (Kay et. al. 2015) for making the data publicly available. Support for the Twentieth Century Reanalysis Project version 3 dataset is provided by the U.S. Department of Energy, Office of Science Biological and Environmental Research (BER), by the National Oceanic and Atmospheric Administration Climate Program Office, and by the NOAA Earth System Research Laboratory Physical Sciences Laboratory.

Files

Preprocessed_data_CNN_CESM1_obs_20CRv3.zip

Files (2.4 GB)

Name	Size
Preprocessed_data_CNN_CESM1_obs_20CRv3.zip md5:6a38b8debb3ce98cc3ecb256be172b90	1.2 GB	Preview Download
Preprocessed_data_MLP_CESM1_obs_20CRv3.zip md5:44041b1769c6403953379a213a9b36f8	1.2 GB	Preview Download
README.md md5:da73dd965e81d7274dfb2a5a7c94272b	10.6 kB	Preview Download

Additional details

Cites: Journal article: 10.1029/2021ms002464 (DOI); Journal article: 10.1002/qj.3598 (DOI); Journal article: 10.1175/bams-d-13-00255.1 (DOI)
Has part: Journal article: 10.1175/BAMS-D-13-00255.1 (DOI)
Is supplement to: Preprint: 10.48550/arXiv.2303.00652 (DOI)

	All versions	This version
Views	170	170
Downloads	130	130
Data volume	96.4 GB	96.4 GB

Finding the right XAI Method --- Dataset

Authors/Creators

Description

Notes

Files

Preprocessed_data_CNN_CESM1_obs_20CRv3.zip

Files (2.4 GB)

Additional details

Related works