The latest AI safety method is a throwback to our maritime past

IBM's Kush Varshney explains the origins of the phrase 'AI governance' and how IBM is adapting its Trust 360 toolkits for the age of generative AI.

Big Data was all the rage when Kush Varshney left graduate school for IBM Research in 2010. Powerful algorithms turned loose on mountains of data were turning up surprising insights into everything from climate change to consumer behavior.

For enterprises, the data deluge brought innovation — and profits. Varshney used data science techniques to help U.S. insurers adapt to President Obama’s expansion of government-sponsored health care under the Affordable Care Act. Internally, he helped IBM identify and retain high-performing employees.

But reducing the minutiae of daily life to numbers also brought new risks, with life-changing consequences for many people. From policing to lending, complaints about bias, lack of transparency, and the misuse of personal data grew. Varshney and his team at IBM shifted gears. Led by his then-boss, Saška Mojsilović, they set out to build the tech industry’s first tools for software developers to mitigate algorithmic bias and harm. Their work was so new it didn’t have a name, so they branded it Trustworthy AI.

Today, IBM’s open-source tools are as relevant as they were when they first were released in 2018. Generative AI may have eclipsed big data as a buzzword, but many of the core issues —fairness, trust, and transparency — remain. Varshney and his team are now designing complementary tools to address the unique threats posed by large language models (LLMs) and other content-generating foundation models.

In less than a year, Varshney and his colleagues at IBM Research have devised new methods to limit chatbots from leaking personal or proprietary data, ‘hallucinating’ wrong information, or insulting their human interlocutors. Embedded in watsonx, IBM’s platform for enterprise AI, these expanded capabilities fall under IBM’s governance rubric.

We caught up with Varshney to talk about governance, and how IBM is addressing generative AI’s potential risks and harms.

How do the dangers posed by LLM-powered chatbots differ from traditional machine-learning models?

Some risks are the same: fairness, transparency, robustness to attacks. But some are new. You’re interacting with a system that may behave in toxic, harmful, or bullying ways. It may hallucinate answers that sound plausible but are factually inaccurate. We’ve identified 39 bad behaviors and have indicated whether they carry over from traditional machine learning or have been introduced or amplified by generative AI.

Bad behavior can be mitigated with data curation, fine-tuning, prompt-tuning, and prompt-engineering. But creating a taxonomy of common harms is only a starting point. Governments, enterprises, and other organizations will have to identify additional constraints to build into their LLMs so that they follow applicable laws, industry standards, social norms, and so on. It’s important to enable customers to define their own desired and undesired behaviors so they can be incorporated and tested.

So, there’s no one-size-fits-all approach. What else makes safety guarantees with generative AI so hard?

Interventions must be scalable because we’re dealing with such large models and datasets. We had to devise an approximate version of our bias mitigation algorithm Fair IJ, for example, because the full version was too computationally intensive. Understanding the models also requires a different approach. We no longer need to know exactly how the model made its predictions; what we used to call explainability. Instead, we must be able to trace an LLM’s generative output to a user’s prompt or its training data. Source attribution is the new explainability. We now also have unique harms and risks that people may not know how to articulate.

How, then, do you know what controls to build into models?

You can ask, but people may not be able to define exactly what they want. We can guide them and give them a structure to express their opinions. This is what we call usage governance. Once a customer has defined the context of a chatbot’s deployment, their use-cases, we can help them reason through the relevant risks. We also take existing policy documents — these could be laws, corporate policies, or other rules — and use them to instruct the model how to behave.

What do you mean by governance?

The word itself means control. In a steam engine, the governor is the part that regulates the flow of steam. It’s the controlling device that keeps the system safe. When we say governance, it’s a set of practices to keep an AI system under control so that it remains safe. It includes things like respecting regulations, curating data, and creating fact sheets that explain how the system works. Interestingly, the words governance and cybernetics, an older term for AI, come from “kubernetes,” the ancient Greek word for ship pilot.

At what point in the AI pipeline is governance added?

Transparency and governance happen throughout. It comes in when IBM curates the data to pre-train our watsonx models, throwing out copyrighted, toxic, or other problematic content. It also comes in during the initial alignment when we teach our models how to follow instructions. My team is developing tools to enable customers to add controls after alignment. Our governance loops are further iterations of instruction tuning. We give the models examples of hateful speech, social biases, and so on, to teach them what not to do. A grocery store chain, for example, may not want their chatbot to mention poisonous foods. A bank may want to avoid controversial topics.

Governance is the final stage of fine-tuning?

Yes, but that's not the end, because problems can still slip through. There are specific mitigators for some behaviors, like the Fair IJ algorithm that I mentioned before. For a given social bias, we can try removing it after alignment. We also have ways of detecting bad behavior when the model is deployed. If the output is problematic, we can stop it.

So, interventions also happen during inferencing?

Yes, if you have a response that's about to come out, and it contains hate speech, or some other issue, we have a way to flag it and stop it. In effect, we help the model ‘think’ before it speaks.

What governance capabilities has IBM Research added to watsonx?

On December 1, tools to monitor different risk dimensions will be generally available. These include hate speech, personally identifiable information, implicit hate, and so forth. Early next year, we'll add capabilities for source attribution, and letting users define their own values and principles — the usage governance that I mentioned earlier. We will also have updated AI Fact Sheets that provide transparency throughout the foundation-model lifecycle.

Why is AI trust and safety so critical for enterprises?

We're creating AI models for a lot of serious business use-cases like HR, software development, and banking. Safety is primary. Whenever we talk to organizations, they bring forward all the potential problems. That helps us think about the entire life cycle. What different sorts of interventions are needed.

Some have called for a halt on AI development. Do you think we have the power to control it?

Yes, I'm optimistic because of IBM’s culture. Trust and transparency have been our guiding light for a long time. We are encouraged to do the work we do. It's important to implement AI safely and responsibly and to avoid shortcuts. When AI is used in consequential applications, we must have the right governance in place. We can bring the wild west under control.

When you look at the history of commercial aviation, we spent the first 50-60 years just getting the planes to fly. But since the Boeing 707 was introduced, planes fly in essentially the same way. The focus shifted to safety — the fatality rate today is hundreds of times lower than it was in the 1970s. Now that AI works, it’s time to shift to trustworthiness and safety.

Subscribe to our Future Forward newsletter and stay up to date on the latest research news

Subscribe to our newsletter

Here comes a foundation model for the Sun
Release
Kim Martineau and Mike Murphy
20 Aug 2025
All decisions have trade-offs. IBM’s Wei Sun is an expert at weighing them
Q & A
Kim Martineau
06 Aug 2025
Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
Can LLMs learn social skills by playing games?
Research
Kim Martineau
23 Jul 2025
- AI
- Generative AI

Related posts

Here comes a foundation model for the Sun

All decisions have trade-offs. IBM’s Wei Sun is an expert at weighing them

Debugging LLMs to improve their credibility

Can LLMs learn social skills by playing games?