A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts
Creators
Description
Codebase [Github] | Dataset [Zenodo]
Abstract
The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github.
Usage
We recommend visiting the official code website for instructions on how to use the dataset and accompaying software code.
License
All ready-made data sets and generated datasets are distributed under the CC-BY-SA 4.0 license, with the exception of Kand-Logic
, which is derived from Kandinsky-patterns
and as such is distributed under the GPL-3.0 license.
Datasets Overview
- CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation.
- BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors.
- kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels.
- bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels.
- sdd-oia. This folder includes all images and labels generated using rsbench.
- sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset.
- BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset.
The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset].
References
[1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020.
[2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.
Files
bbox-kand-logic.zip
Files
(2.0 GB)
Name | Size | Download all |
---|---|---|
md5:8e267b4e4275fc7febe1de7297c2bb88
|
53.8 MB | Preview Download |
md5:5d34469f888aaffb358e838d864b688e
|
780.0 MB | Preview Download |
md5:cb6749f789fbd3b63ea12dc8ee6bc639
|
53.4 MB | Preview Download |
md5:61c7a960d58098cdeefb7cf99810ad2f
|
205.5 MB | Preview Download |
md5:4648a201146c40a8a8c2cbdb0e0aa408
|
16.8 MB | Preview Download |
md5:fd474380c802f2034c338204e831265b
|
49.8 MB | Preview Download |
md5:0a27356800b00ba274d800c9d934f089
|
888.9 MB | Preview Download |
Additional details
Funding
- TANGO: It takes two to tango: a synergistic approach to human-machine decision making 10082598
- UK Research and Innovation
- PFV-4-PTAI – Probabilistic Formal Verification for Provably Trustworthy AI 101110960
- European Commission
- UNREAL: A Unified Reasoning Layer for Trustworthy ML EP/Y023838/1
- UK Research and Innovation
- Turing AI Fellowship TEAMER: Teaching Machines To Reason Like Humans EP/W002876/1
- UK Research and Innovation