Just like any technology, deep learning should be hacker-proof – as much as possible.
In science speak, such hacker-proofing of deep neural networks is called improving their adversarial robustness. Our recent MIT-IBM paper, accepted at this year’s NeurIPS – the largest global AI conference – is dealing with exactly that.1 We focus on practical solutions to evaluate and improve adversarial robustness.
That’s because AI models are no different than, say, car models when it comes to security and robustness. We should exercise the same process as car model testing – collision testing, safety certification, standard warranties and so on – when developing and deploying AI. That’s what we’ve done in our research – having developed several testing techniques to identify ‘bugs’ in modern AI and machine learning systems across different neural network models and data modalities.
Many industries are now undergoing digital transformation and adopting AI for their products or services. However, they may not have sufficient awareness of potential risks in reputation, revenue, and law and governance if their AI-empowered products have not undergone comprehensive robustness testing, adversarial effect detection, model hardening and repairing, and model certification.
For defense, we have developed several patches that can be added to a trained neural network to strengthen its robustness. It can be done at different phases of the AI life cycle – including mitigating training phase and testing (deployment) phrase attack threats. We’ve developed advanced and practical approaches to enable deep learning model hardening as a service.
Our (previously published) techniques include ‘model inspection’ for detecting and assessing model vulnerabilities,2 ‘model wash’ for dealing with adversarial factors,3 and ‘model strengthening’ that improves adversarial robustness of a given deep learning model or system.
We’ve also worked on certification – developing quantifiable metrics to efficiently certify the level of attack-proof robustness. We’ve considered attack types including small imperceptible perturbations4 and semantic changes such as color shifting and image rotation.5 We’ve developed model-agnostic robustness benchmarking scores called CLEVER,6 efficient robustness verification tools, and new training methods making the resulting model more certifiable. We are now continuing to expand our certification tools to cover different types of adversarial attacks.
And finally, we propose a novel approach for reprogramming machine learning models for learning new tasks. This approach also works for optimizing molecules to obtain desired chemical properties, which could help boost AI for scientific discovery. The technique is particularly appealing as a cost-effective solution to transfer learning to a new domain, where data labels are scarce and expensive to acquire.
We should continue to raise awareness about the importance of security when it comes to AI – and as researchers, we’ll keep breaking new ground in making AI more and more robust, for the benefit of all. For now, businesses around the world are welcome to use IBM-developed open-source adversarial robustness toolbox (ART) and AI FactSheets . The more robust our AI models are, the better ready they are to face the world – and deal with hacks.
- Mohapatra, J. et al. Higher-Order Certification for Randomized Smoothing. arXiv:2010.06651 [cs, stat] (2020).↩
- Wang, R. et al. Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases. arXiv:2007.15802 [cs, stat] (2020).↩
- Zhao, P., Chen, P.-Y., Das, P., Ramamurthy, K. N. & Lin, X. Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness. arXiv:2005.00060 [cs, stat] (2020).↩
- Boopathy, A., Weng, T.-W., Chen, P.-Y., Liu, S. & Daniel, L. CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks. arXiv:1811.12395 [cs, stat] (2018).↩
- Mohapatra, J. et al. Towards Verifying Robustness of Neural Networks Against Semantic Perturbations. arXiv:1912.09533 [cs, stat] (2020).↩
- Weng, T.-W. et al. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. arXiv:1801.10578 [cs, stat] (2018).↩