ImageNet Mechanistic Interpretability
- 1. Max Planck Institute for Inteligent Systems
- 2. University of Tübbingen
Description
To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.
Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models.
The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.
Files
human_responses.zip
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:f886fc48a87baf51f2beb834924c8b62
|
61.9 MB | Preview Download |
md5:47c364fd92752d3412f1c08f8cd6d793
|
1.6 GB | Preview Download |