5 minute read

New research helps make AI fairer in decision-making

Our team developed the first practical procedures and tools for achieving Individual Fairness in machine learning (ML) and artificial intelligence (AI) systems.

Our team developed the first practical procedures and tools for achieving Individual Fairness in machine learning (ML) and artificial intelligence (AI) systems.

“My best friend is gay.”

The phrase seems innocent enough — and can be neutral or positive. But an artificial intelligence model trained on data of real human interactions — say, from online conversations — can easily interpret this sentence as toxic.

To tackle bias in AI, our IBM Research team in collaboration with the University of Michigan has developed practical procedures and tools to help machine learning and AI achieve Individual Fairness. The key idea of Individual Fairness is to treat similar individuals well, similarly, to achieve fairness for everyone.

Renderings of the SenSeI tool
In this toy example, points on the same horizontal lines are considered similar. On the left, we see a vanilla neural network with decisions varying along horizontal lines, thus being unfair. On the right, neural network trained with our method, SenSeI, achieves individually fair predictions.

Our results, described in series of papers presented at ICLR 2021 conference, are the first-ever practical algorithms for training individually fair AI1, 2, 3 and procedures for auditing AI4 for violations of individual fairness.

Keeping bias at bay

Humans are often biased, consciously or not. When it comes to high-level decision making, such bias can have life-altering consequences.

To automate or assist with decision making, AI and machine learning are routinely used in fields like criminal justice, education, finance, job market and healthcare. It might seem logical that using algorithms for making decisions should alleviate human biases. In real life though, AI and machine learning systems often perpetuate or even exacerbate human biases baked into the data the systems are trained on.

For example, an AI should treat the CVs of two applicants that only differ in name and gender absolutely equally — but it’s often not the case. Or take a judge who opts to use an algorithm to assess a defendant’s chance of re-offending. If the AI hadn’t been trained on provably fair data, the judge has got a problem.

To deal with the issue, we’ve turned to the mathematical theory of distributionally robust optimization. The theory states that the algorithm should perform equally well not only on the training data, but also on all similar datasets.

Based on the theory, we developed an approach for training machine learning and AI systems that remain fair despite biases in the training data. Because biases will always stay.

A graphic showing individual fairness in ranking
Individual fairness in ranking.

To do so, we proposed the Distributional Individual Fairness (DIF) model that measures differences in decisions on the original data and all possible similar datasets. For example, consider a dataset of CVs and similar datasets of the same CVs but with altered gender pronouns and names. To achieve fairness, we want DIF to be small.

The goal of our approach is to train a system that is simultaneously accurate and has a small DIF value. There are two steps in our training procedure. The first one is responsible for providing an accurate classification, and the second one – for auditing these classification decisions and making the previous step harder when it detects an unfair behavior. The second step is essentially measuring DIF and penalizing large values.

Dealing with ‘toxic’ comments

Our work presents mathematical guarantees that this approach trains individually fair machine learning and AI systems and verifies the results in practice. When training their algorithms, users can measure the final DIF value. Small DIF values on the training data imply fairness that will continue to hold when the system is in use.

In the paper, we describe how our methods could be applied to “toxic comment” detection. Natural language processing (NLP) experts have noticed that models learn to associate certain identities — such as, say, “gay” — with toxicity, because they are often abused in online conversations. Such models may then describe a neutral or positive sentence such as ‘my best friend is gay’ as toxic.

Individual fairness, though, requires the same output of an NLP system on comparable sentences – say, “my best friend is gay” and “my best friend is straight.” We show that a language model trained with our algorithm can accurately identify toxic comments, significantly improving individual fairness.

The results could help in facilitating inclusive discussions online, fair ranking and fair classification for loan decision making, and auditing fairness of models for criminal justice.

As a next step, we are now developing fairness algorithms that could work on private demographic data to enable algorithmic fairness in applications with legal privacy regulations such as banking. Another direction we are exploring is post-processing methods suitable for large pre-trained models often used in computer vision and NLP.

We hope that our research could help make AI fairer — and benefit society in an unbiased and inclusive manner.

Review IBM’s papers and presentations at ICLR 2021.




  1. M. Yurochkin and Y. Sun. SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness. ICLR, 2021.

  2. A. Vargo, F. Zhang, M. Yurochkin, and Y. Sun. Individually Fair Gradient Boosting. ICLR 2021 – Spotlight Presentation.

  3. A. Bower, H. Eftekhari, M. Yurochkin, and Y. Sun. Individually Fair Ranking. ICLR 2021.

  4. S. Maity, S. Xue, M. Yurochkin, and Y. Sun. Statistical inference for individual fairness. ICLR 2021.