Foundation Model Privacy

Foundation Model Privacy


Privacy has always been a concern when seeking to develop trustworthy AI solutions, even with conventional machine learning and deep learning models. Today, with the prevalence of large language models, that serve as foundation models, this concern becomes even more acute. Language models have an inherent tendency to memorize and even reproduce in their outputs text sequences learned during training, may this be pre-training, fine-tuning or even prompt-tuning. If this training data contained sensitive or personal information, this could result in a major privacy breach.

IBM is currently researching and developing methods to assess the privacy risk of large foundational models, adapted to cover these new and evolving attack vectors and able to scale to these huge model sizes. Moreover, we are investigating potential mitigation strategies that can help large language models be more resistant to this kind of attack.


  • P202204424 - A System and Method for Privacy Risk Assessment of ML model incorporating shadow models and data in combination with user model and data as part of automated process of selecting and invoking attacks
  • P202204171 - Selecting statistical queries for synthetic data generation for ML model training
  • P202204169 - Machine Unlearning Using Model Meta-Editing
  • P202202702 - Analysis of privacy risk of machine learning features
  • P202201528 - Explainability guided greedy data minimization method
  • P202201526 - Using local model explainability to find decision boundaries for data minimization