Report Open Access
Mygdalis Vasileios; Ioannis Pitas
This work addresses the problem of adversarial robustness in deep neural network classification from an optimal class boundary estimation perspective. It is argued that increased model robustness to adversarial attacks can be achieved when the feature learning process is monitored by geometrically-inspired optimization criteria. To this end, we propose to learn hyperspherical class prototypes in the neural feature embedding space, along with training the network parameters. Three concurrent optimization functions for the intermediate hidden layer training data activations are devised, requiring items of the same class to be enclosed by the corresponding class prototype boundaries, to have minimum distance from their class prototype vector (i.e., hypersphere center) and to have maximum distance from the remainder hypersphere centers. Our experiments show that training standard classification model architectures with the proposed objectives, significantly increases their robustness to white-box adversarial attacks, without adverse (if not beneficial) effects to their classification accuracy.
Hypespherical class prototypes for adversarial robustness_.pdf