Dataset Open Access

ImageNet Mechanistic Interpretability

Zimmermann, Roland S.; Klein, Thomas; Brendel, Wieland

To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.

Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models. 

The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.

Files (1.7 GB)
Name Size
61.9 MB Download
1.6 GB Download
All versions This version
Views 3535
Downloads 66
Data volume 6.6 GB6.6 GB
Unique views 3232
Unique downloads 44


Cite as