3 minute read

DARPA and IBM are ensuring that anyone can protect their AI systems from hackers

As part of DARPA’s GARD project, IBM researchers have been working on red and blue team tools to detect sophisticated attacks on AI models and how to protect against them. And now, IBM's toolkit supports Hugging Face models.

As part of DARPA’s GARD project, IBM researchers have been working on red and blue team tools to detect sophisticated attacks on AI models and how to protect against them. And now, IBM's toolkit supports Hugging Face models.

There’s a story that’s becoming commonplace in the warnings about our sci-fi future. Imagine you’re in a self-driving car, pulling up to a stop sign, but the car accelerates through the sign and crashes. In this scenario, the car’s AI software misinterpreted a stop sign because hackers covered up small sections of the sign.

While this may seem like a far-off thing to be concerned about, there are real attacks that can affect AI models right now, from inserting malicious code into training data, to altering objects that a system might be designed to infer. And as AI use has proliferated, work to defend against these sorts of attacks has ramped up. This field, called adversarial robustness, has been a key area of focus for researchers at IBM, who developed what they call the Adversarial Robustness Toolbox (ART) in 2018 to help combat against attacks.

Over the last four years, IBM has been working with DARPA, the US Department of Defense’s (DoD) research and development arm, along with a group of other companies and university partners, to tackle adversarial AI issues. Led by the Principal Investigator (PI) Nathalie Baracaldo, and the co-PI Mark Purcell, IBM’s team has been part of a project called Guaranteeing AI Robustness Against Deception (GARD), which aimed to build defenses capable of handling emerging threats, develop fundamental theory to enable provably robust systems, and develop tools that can reliably evaluate algorithms’ defenses. As part of the project, researchers have updated ART for use cases that the US military might encounter — as well as anyone else developing AI systems that operate in the world.

ART can be applied to real-world use cases today. As part of DARPA’s mission to prevent national strategic surprises, these use cases are broad and go beyond military applications to include critical infrastructure and systems used throughout the government where there may be vulnerable AI algorithms. Security, of course, is an issue for everyone — not just the DoD. This was partly the inspiration for ensuring ART is openly available.

Back in 2020, IBM donated ART to the Linux Foundation with the desire to encourage more AI practitioners to work together to build tools that could help secure real-world AI deployments. ART has its own GitHub repository, and supports several major machine-learning model structures, such as TensorFlow and PyTorch. And now, IBM has put the updated toolbox on Hugging Face for the same reason — to meet AI practitioners where they are.

Hugging Face has quickly become one of the most popular places on the internet for discovering and implementing AI models. Several IBM projects, including the recent geospatial model designed with NASA, have been open sourced on Hugging Face. The ART toolbox on Hugging Face is specifically designed to be used on models found on the AI repository. It includes examples of attacks and defenses for evasion and poisoning threats, as well as how to integrate the toolbox into timm, a library used to write models for Hugging Face.  

Bringing together the community with standardized benchmarks

Before GARD, the adversarial AI community was fragmented and nascent. Researchers were primarily concerned with digital attacks, such as adding minor perturbations to images — but these weren’t the most practical problems to tackle. In the real world, the issues are primarily around physical attacks, such as putting a sticker on a stop sign to confuse an autonomous vehicle’s AI model, and attacks where someone would poison training data.

In this fragmented community, researchers would evaluate the defenses they built on their own benchmarks. ART was the first unified toolbox for a range of practical attacks, and now has thousands of stars on GitHub. This highlights how the community is coming together to work on common goals for securing their AI workflows. Machine-learning models, although advanced, as still very brittle, susceptible to both dedicated attacks, but also just natural noise that might occur in the world.

Prior to ART, the research and AI security community didn't have a single place to share code of attacks and defenses to advance the field. ART provides a platform to make this happen, and allows teams focus on more specific tasks. As part of GARD, the team has developed tools for both red and blue teams to analyze and assess the effectiveness of specific machine learning models against a diverse set of attacks, including poisoning and evasion attacks. The practical defenses against those attacks are what’s encompassed in ART. After four years, the project is ending this spring, but that doesn’t mean the work stops.

As an open-source project, ART is available for the entire community. It includes tools for a variety of modalities, such as image, object detection, object tracking and speech recognition and classification. Models from Hugging Face, TensorFlow and Keras are also supported.  And now that ART is out in the open, it’s available for anyone to find new ways to protect the models they build and run.

Learn more about poisoning attacks and defenses as well as evasion counterparts for Hugging Face models. If you’re interested in diving into coding examples, you can check out our Jupyter notebooks on red-blue teaming for poisoning here, and try out ART with the GitHub repo here. For all of the GARD tools available for use, you can visit GARD’s website.