Exploiting One-class classification optimization objectives for increasing adversarial robustness
Description
This work examines the problem of increasing the robustness of deep neural network-based image classification systems to adversarial attacks, without changing the neural architecture or employ adversarial examples in the learning process. We attribute their famous lack of robustness to the geometric properties of the deep neural network embedding space, derived from standard optimization options, which allow minor changes in the intermediate activation values to trigger dramatic changes to the decision values in the final layer. To counteract this effect, we explore optimization criteria that supervise the distribution of the intermediate embedding spaces, in a class-specific basis, by introducing and leveraging one-class classification objectives. The proposed learning procedure compares favorably to recently proposed training schemes for adversarial robustness in black-box adversarial attack settings.
Files
Exploiting_one_class_classification_for_increasing_deep_learning_model_robustness_to_adversarial_attacks.pdf
Files
(209.7 kB)
Name | Size | Download all |
---|---|---|
md5:572c81b6c37e7f3cc655fc524f8adcc0
|
209.7 kB | Preview Download |