Attention-guided transformation-invariant attack for black-box adversarial examples

Jiaqi Zhu; Feng Dai; Lingyun Yu; Hongtao Xie; Lidong Wang; Bo Wu; Yongdong Zhang

doi:10.1002/int.22808

International Journal of Intelligent Systems

Paper

11 Jan 2022

Attention-guided transformation-invariant attack for black-box adversarial examples

View publication

Abstract

With the development of media convergence, information acquisition is no longer limited to traditional media, such as newspapers and televisions, but more from digital media on the Internet, where media contents should be under supervision by platforms. At present, the media content analysis technology of Internet platforms relies on deep neural networks (DNNs). However, DNNs show vulnerability to adversarial examples, which results in security risks. Therefore, it is necessary to adequately study the internal mechanism of adversarial examples to build more effective supervision models. When coming to practical applications, supervision models are mostly faced with black-box attacks, where cross-model transferability of adversarial examples has attracted increasing attention. In this paper, to improve the transferability of adversarial examples, we propose an attention-guided transformation-invariant adversarial attack method, which incorporates an attention mechanism to disrupt the most distinctive features and simultaneously ensures adversarial attack invariance under different transformations. Specifically, we dynamically weight the latent features according to an attention mechanism and disrupt them accordingly. Meanwhile, considering the lack of semantics in low-level features, high-level semantics are introduced as spatial guidance to make low-level feature perturbations concentrate on the most discriminative regions. Moreover, since the attention heatmaps may vary significantly across different models, a transformation-invariant aggregated attack strategy is proposed to alleviate overfitting to the proxy model attention. Comprehensive experimental results show that the proposed method can significantly improve the transferability of adversarial examples.

Conference paper