Dataset Open Access
Zimmermann, Roland S.; Klein, Thomas; Brendel, Wieland
To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.
Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models.
The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.
Name | Size | |
---|---|---|
human_responses.zip
md5:f886fc48a87baf51f2beb834924c8b62 |
61.9 MB | Download |
image_data.zip
md5:47c364fd92752d3412f1c08f8cd6d793 |
1.6 GB | Download |
All versions | This version | |
---|---|---|
Views | 35 | 35 |
Downloads | 6 | 6 |
Data volume | 6.6 GB | 6.6 GB |
Unique views | 32 | 32 |
Unique downloads | 4 | 4 |