FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs

Swanand Ravindra Kadhe; Anisa Halimi; Ambrish Rawat; Nathalie Baracaldo Angel

NeurIPS 2023

Workshop paper

10 Dec 2023

FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs

Abstract

Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the $\textit{unlearning}$ discipline where models are modified to "unlearn" undesirable information without retraining. However, any modification can alter the behaviour of LLMs, especially on key dimensions such as $\textit{fairness}$ . This is the first work that examines this interplay between unlearning and fairness for LLMs. In particular, we focus on a popular unlearning framework known as SISA [Bourtoule et al., 2021], which creates an ensemble of models trained on disjoint shards. We evaluate the performance-fairness trade-off for SISA, and empirically demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we propose post-processing bias mitigation techniques for ensemble models produced by SISA. Through experimental results, we demonstrate the efficacy of our post-processing framework called $\textit{FairSISA}$ .