Trustworthy AI
Our trust in technology relies on understanding how it works. It’s important to understand why AI makes the decisions it does. We’re developing tools to make AI more explainable, fair, robust, private, and transparent.
Overview
Artificial intelligence systems have become increasingly prevalent in everyday life and enterprise settings, and they’re now often being used to support human decision-making. These systems have grown increasingly complex and efficient, and AI holds the promise of uncovering valuable insights across a wide range of applications. But broad adoption of AI systems will require humans to trust their output.
When people understand how technology works, and we can assess that it’s safe and reliable, we’re far more inclined to trust it. Many AI systems to date have been black boxes, where data is fed in and results come out. To trust a decision made by an algorithm, we need to know that it is fair, that it’s reliable and can be accounted for, and that it will cause no harm. We need assurances that AI cannot be tampered with and that the system itself is secure. We need to be able to look inside AI systems, to understand the rationale behind the algorithmic outcome, and even ask it questions as to how it came to its decision.
At IBM Research, we’re working on a range of approaches to ensure that AI systems built in the future are fair, robust, explainable, account, and align with the values of the society they’re designed for. We’re ensuring that in the future, AI applications are as fair as they are efficient across their entire lifecycle.
When people understand how technology works, and we can assess that it’s safe and reliable, we’re far more inclined to trust it. Many AI systems to date have been black boxes, where data is fed in and results come out. To trust a decision made by an algorithm, we need to know that it is fair, that it’s reliable and can be accounted for, and that it will cause no harm. We need assurances that AI cannot be tampered with and that the system itself is secure. We need to be able to look inside AI systems, to understand the rationale behind the algorithmic outcome, and even ask it questions as to how it came to its decision.
At IBM Research, we’re working on a range of approaches to ensure that AI systems built in the future are fair, robust, explainable, account, and align with the values of the society they’re designed for. We’re ensuring that in the future, AI applications are as fair as they are efficient across their entire lifecycle.
Our work
Why we’re teaching LLMs to forget things
ExplainerKim MartineauA toxic language filter built for speed
NewsKim MartineauTeaching AI models to improve themselves
ResearchPeter HessIBM and RPI researchers demystify in-context learning in large language models
NewsPeter HessIBM reaffirms its commitment to the Rome Call for AI ethics
NewsMike MurphyTiny benchmarks for large language models
NewsKim Martineau- See more of our work on Trustworthy AI
Topics
- AI TestingWe’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes.
- Adversarial Robustness and PrivacyWe’re making tools to protect AI and certify its robustness, and helping AI systems adhere to privacy requirements.
- Explainable AIWe’re creating tools to help AI systems explain why they made the decisions they did.
- Fairness, Accountability, TransparencyWe’re developing technologies to increase the end-to-end transparency and fairness of AI systems.
- Trustworthy GenerationWe’re developing theoretical and algorithmic frameworks for generative AI to accelerate future scientific discoveries.
- Uncertainty QuantificationWe’re developing ways for AI to communicate when it's unsure of a decision across the AI application development lifecycle.
Publications
Adaptive PII Mitigation Framework for Large Language Models
- 2025
- AAAI 2025
Sequential Uncertainty Quantification with Contextual Tensors for Social Targeting,
- Ide-San Ide
- Keerthiram Murugesan
- et al.
- 2024
- KAIS
Optimal Transport for Efficient, Unsupervised Anomaly Detection on Industrial Data
- Abigail Langbridge
- Fearghal O'Donncha
- et al.
- 2024
- Big Data 2024
Future Workload and Cloud Resource Usage: Insights from an Interpretable Forecasting Model
- 2024
- Big Data 2024
Better Bias Benchmarking of Language Models via Multi-factor Analysis
- Hannah Powers
- Ioana Baldini Soares
- et al.
- 2024
- NeurIPS 2024
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
- Dennis Wei
- Inkit Padhi
- et al.
- 2024
- NeurIPS 2024
Building trustworthy AI with Watson
Our research is regularly integrated into Watson solutions to make IBM’s AI for business more transparent, explainable, robust, private, and fair.
Learn more