An Empirical Evaluation of Mutation Operators for Deep Learning Systems
Deep Learning (DL) is increasingly adopted to solve complex tasks such as image recognition or autonomous driving. Companies are considering the inclusion of DL components in production systems, but one of their main concerns is how to assess the quality of such systems. Mutation testing is a technique to inject artificial faults into a system, under the assumption that the capability to expose (kill) such artificial faults translates into the capability to expose also real faults.
Researchers have proposed approaches and tools (e.g., DeepMutation and muNN) that make mutation testing applicable to deep learning systems. However, existing definitions of mutation killing, based on accuracy drop, do not take into account the stochastic nature of the training process (accuracy may drop even when re-training the un-mutated system). Moreover, the same mutation operator might be effective or might be trivial/impossible to kill, depending on its hyper-parameter configuration. We conducted an empirical evaluation of existing operators, showing that mutation killing requires a stochastic definition and identifying the subset of effective mutation operators together with the associated most effective configurations.
- Is obsoleted by
- 10.1109/ICST46399.2020.00018 (DOI)