Report Open Access

# Introducing K-anonymity principles to adversarial attacks for privacy protection in image classification problems

Mygdalis Vasileios; Tefas Anastasios; Pitas Ioannis

The network output activation values for a given input can be employed to produce a sorted ranking. Adversarial attacks typically generate the least amount of perturbation required to change the classifier label. In that sense, generated adversarial attack perturbation only affects the output in the 1st sorted ranking position. We argue that meaningful information about the adversarial examples  i.e., their original labels, is still encoded in the network output ranking and could potentially be extracted, using rule-based reasoning. To this end, we introduce a novel adversarial attack methodology inspired by the K-anonymity principles, that generates adversarial examples that are not only misclassified by the neural network classifier, but are uniformly spread along K different positions in the output sorted ranking. In order to regulate the introduced perturbation that arises from the strength of the proposed optimization objectives, an additional visual similarity-based loss function is introduced as well, guiding the adversarial examples towards directions maintaining visual similarity according the some objective metric, such as the CW-SSIM. Experimental results denote that the proposed approach achieves the optimization goals inspired by K-anonymity, while introducing reduced perturbation as well.
46
46
views