IJCAI 2022
Conference paper

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations


Adversarial perturbations are critical for training robust deep learning models. Often, adversarial perturbations for a given victim model are generated separately for each image. An alternate approach is the generation of ``universal adversarial perturbations'' (UAP) that can simultaneously attack multiple images, and thus offers a more unified threat model without calling for an image-wise attack generation algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, in this paper we take a novel view of UAP generation as an instance of ``meta-learning'', and leverage bi-level optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). To be specific, we consider the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator over just few-shot image classification tasks. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to combine it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent (PGD) that does not leverage meta-learning, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and corresponding image data sources.