Adversarial Robustness and Privacy
Even advanced AI systems can be vulnerable to adversarial attacks. We’re making tools to protect AI and certify its robustness, including quantifying the vulnerability of neural networks and designing new attacks to make better defenses. And we’re helping AI systems adhere to privacy requirements.
Our work
What is red teaming for generative AI?
ExplainerKim MartineauAn open-source toolkit for debugging AI models of all data types
Technical noteKevin Eykholt and Taesung LeeDid an AI write that? If so, which one? Introducing the new field of AI forensics
ExplainerKim MartineauManipulating stock prices with an adversarial tweet
ResearchKim MartineauSecuring AI systems with adversarial robustness
Deep DivePin-Yu Chen8 minute readResearchers develop defenses against deep learning hack attacks
ReleaseAmbrish Rawat, Killian Levacher, and Mathieu Sinn7 minute read- See more of our work on Adversarial Robustness and Privacy
Publications
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
- Ambrish Rawat
- Stefan Schoepf
- et al.
- 2024
- NeurIPS 2024
Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training
- Kristjan Greenewald
- Yuancheng Yu
- et al.
- 2024
- NeurIPS 2024
Membership Inference Attacks Against Time-Series Models
- Noam Koren
- Abigail Goldsteen
- et al.
- 2024
- ACML 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
- 2024
- AIES 2024
On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets
- Ching-yun Ko
- Pin-Yu Chen
- et al.
- 2024
- COLM 2024
Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning
- Zhiyuan He
- Yijun Yang
- et al.
- 2024
- ICML 2024