Recent studies have demonstrated that as classifiers, deep neural networks (e.g., CNNs) are quite vulnerable to adversarial attacks that only add quasi-imperceptible perturbations to the input data but completely change the predictions of the classifiers. To defend classifiers against such adversarial attacks, here we focus on the white-box adversarial defense where the attackers are granted full access to not only the classifiers but also defenders to produce as strong attack as possible. We argue that a successful white-box defender should prevent the attacker from not only direct gradient calculation but also a gradient approximation. Therefore we propose viewing the defense from the perspective of a functional, a high-order function that takes other functions as input and return a new function as the defender. Such a design makes the defender a hidden function, whose gradients are hard to be estimated without knowing the prior. To this end, we propose a novel Robust Iterative Data Estimation (RIDE) algorithm that works as a defender by estimating the true underlying data using each individual adversarial observation. Specifically, the RIDE algorithm takes a randomly initialized neural network as input and returns a parameterized defense model through self-supervised optimization. To the best of our knowledge, we are the first to propose novel self-supervised data estimation for white-box adversarial defense by viewing defenders as functionals.
This code implements our RIDE algorithm for adversarial defense. As demonstration we show some qualitative results of the defense against 10-iteration white-box attack (PGD attack with BPDA) on MNIST dataset using (a) median filtering, (b) total-variance minimization and (c) the proposed RIDE algorithm. This code is for our arxiv submission “White-Box Adversarial Defense via Self-Supervised Data Estimation”.
- "White-Box Adversarial Defense via Self-Supervised Data Estimation", arXiv, September 2019.,
To download the software, please preview and agree to MERL's research-only licensing terms.