An End-to-end Framework for Privacy Risk Assessment of AI Models
- Abigail Goldsteen
- Shlomit Shachor
- et al.
- 2022
- SYSTOR 2022
Privacy has always been a concern when seeking to develop trustworthy AI solutions, even with conventional machine learning and deep learning models. Today, with the prevalence of large language models, that serve as foundation models, this concern becomes even more acute. Language models have an inherent tendency to memorize and even reproduce in their outputs text sequences learned during training, may this be pre-training, fine-tuning or even prompt-tuning. If this training data contained sensitive or personal information, this could result in a major privacy breach.
IBM is currently researching and developing methods to assess the privacy risk of large foundational models, adapted to cover these new and evolving attack vectors and able to scale to these huge model sizes. Moreover, we are investigating potential mitigation strategies that can help large language models be more resistant to this kind of attack.