Published June 12, 2024 | Version v1
Dataset Open

A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

  • 1. ROR icon University of Trento
  • 2. ROR icon Fondazione Bruno Kessler
  • 3. ROR icon University of Edinburgh

Contributors

Hosting institution:

  • 1. ROR icon University of Trento

Description

Codebase [Github] | Dataset [Zenodo]

 

Abstract

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github.

 

Usage

We recommend visiting the official code website for instructions on how to use the dataset and accompaying software code.

 

License

All ready-made data sets and generated datasets are distributed under the CC-BY-SA 4.0 license, with the exception of Kand-Logic, which is derived from Kandinsky-patterns and as such is distributed under the GPL-3.0 license.

 

Datasets Overview

  • CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation.
  • BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors.
  • kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels.
  • bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels.
  • sdd-oia. This folder includes all images and labels generated using rsbench.
  • sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset.
  • BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset.

The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset].

 

References

[1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020.

[2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.

 

Files

bbox-kand-logic.zip

Files (2.0 GB)

Name Size Download all
md5:8e267b4e4275fc7febe1de7297c2bb88
53.8 MB Preview Download
md5:5d34469f888aaffb358e838d864b688e
780.0 MB Preview Download
md5:cb6749f789fbd3b63ea12dc8ee6bc639
53.4 MB Preview Download
md5:61c7a960d58098cdeefb7cf99810ad2f
205.5 MB Preview Download
md5:4648a201146c40a8a8c2cbdb0e0aa408
16.8 MB Preview Download
md5:fd474380c802f2034c338204e831265b
49.8 MB Preview Download
md5:0a27356800b00ba274d800c9d934f089
888.9 MB Preview Download

Additional details

Funding

TANGO: It takes two to tango: a synergistic approach to human-machine decision making 10082598
UK Research and Innovation
PFV-4-PTAI – Probabilistic Formal Verification for Provably Trustworthy AI 101110960
European Commission
UNREAL: A Unified Reasoning Layer for Trustworthy ML EP/Y023838/1
UK Research and Innovation
Turing AI Fellowship TEAMER: Teaching Machines To Reason Like Humans EP/W002876/1
UK Research and Innovation