Attacking the Madry defense model with L1-based adversarial examples
Abstract
The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal L∞distortion = 0.3. This decision discourages the use of attacks which are not optimized on the L∞distortion metric. Our experimental results demonstrate that by relaxing the L∞constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average L∞distortion, have minimal visual distortion. These results call into question the use of L∞as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.