ImageNet Mechanistic Interpretability

Zimmermann, Roland S.; Klein, Thomas; Brendel, Wieland

doi:10.5281/zenodo.8131197

Published July 12, 2023 | Version 1.0

Dataset Open

ImageNet Mechanistic Interpretability

1. Max Planck Institute for Inteligent Systems
2. University of Tübbingen

To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.

Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models.

The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.

Files

human_responses.zip

Files (1.7 GB)

Name	Size	Download all
human_responses.zip md5:f886fc48a87baf51f2beb834924c8b62	61.9 MB	Preview Download
image_data.zip md5:47c364fd92752d3412f1c08f8cd6d793	1.6 GB	Preview Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	363	358
Downloads	77	76
Data volume	75.6 GB	74.0 GB

ImageNet Mechanistic Interpretability

Creators

Description

Files

human_responses.zip

Files (1.7 GB)