A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

Samuele, Bortolotti; Emanuele, Marconato; Tommaso, Carraro; Paolo, Morettin; Emile, van Krieken; Antonio, Vergari; Stefano, Teso; Andrea, Passerini

doi:10.5281/zenodo.11612556

Published June 12, 2024 | Version v1

Dataset Open

A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

1. University of Trento
2. Fondazione Bruno Kessler
3. University of Edinburgh

Contributors

Hosting institution:

SML Group¹

1. University of Trento

Codebase [Github] | Dataset [Zenodo]

Abstract

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github.

Usage

We recommend visiting the official code website for instructions on how to use the dataset and accompaying software code.

License

All ready-made data sets and generated datasets are distributed under the CC-BY-SA 4.0 license, with the exception of Kand-Logic, which is derived from Kandinsky-patterns and as such is distributed under the GPL-3.0 license.

Datasets Overview

CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation.
BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors.
kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels.
bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels.
sdd-oia. This folder includes all images and labels generated using rsbench.
sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset.
BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset.

The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset].

References

[1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020.

[2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.

Files

bbox-kand-logic.zip

Files (2.0 GB)

Name	Size	Download all
bbox-kand-logic.zip md5:8e267b4e4275fc7febe1de7297c2bb88	53.8 MB	Preview Download
BDD-OIA-original-dataset.zip md5:5d34469f888aaffb358e838d864b688e	780.0 MB	Preview Download
BDD-OIA-preprocessed.zip md5:cb6749f789fbd3b63ea12dc8ee6bc639	53.4 MB	Preview Download
CLIP-embeddings.zip md5:61c7a960d58098cdeefb7cf99810ad2f	205.5 MB	Preview Download
kand-logic-3k.zip md5:4648a201146c40a8a8c2cbdb0e0aa408	16.8 MB	Preview Download
sdd-oia-embeddings.zip md5:fd474380c802f2034c338204e831265b	49.8 MB	Preview Download
sdd-oia.zip md5:0a27356800b00ba274d800c9d934f089	888.9 MB	Preview Download

Additional details

UK Research and Innovation
TANGO: It takes two to tango: a synergistic approach to human-machine decision making 10082598
European Commission
PFV-4-PTAI - Probabilistic Formal Verification for Provably Trustworthy AI 101110960
UK Research and Innovation
UNREAL: A Unified Reasoning Layer for Trustworthy ML EP/Y023838/1
UK Research and Innovation
Turing AI Fellowship TEAMER: Teaching Machines To Reason Like Humans EP/W002876/1

	All versions	This version
Views	203	203
Downloads	217	217
Data volume	66.6 GB	66.6 GB

A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

Creators

Contributors

Hosting institution:

Description

Files

bbox-kand-logic.zip

Files (2.0 GB)

Additional details

Funding