Published July 1, 2023 | Version v1
Report Open

An Empirical Study on Low- and High-Level Explanations of Deep Learning Misbehaviours

Description

Background: Most quality assessment approaches for Deep Learning (DL) focus on finding misbehaviourinducing inputs. However, it is difficult to clearly understand the causes of misbehaviours, due to the DL software opaqueness. Recent research proposed different techniques to explain DL misbehaviours, producing input explanations either at a “low level” (raw input elements) or at a “high level” (input features).

Aims: We aim to compare the similarity between different explanations and assess to what extent they are understandable.

Method: We have conducted an empirical study involving 3 state-of-the-art techniques for DL explanation in 13 configurations, applied to 2 different DL tasks. We have also collected answers from 48 questionnaires submitted to SE experts.

Results: Low- and high-level techniques provide dissimilar explanations for the same inputs. However, experts deemed none of the explanations as useful in 28% of the cases.

Conclusion: Despite the complementarity of existing explanations, further research is needed to produce better explanations.

Files

TR-Precrime-2023-09.pdf

Files (1.7 MB)

Name Size Download all
md5:10895aadc6093deadb41841694131d75
1.7 MB Preview Download

Additional details

Related works

Is published in
10.1109/ESEM56168.2023.10304866 (DOI)

Funding

European Commission
PRECRIME - Self-assessment Oracles for Anticipatory Testing 787703