The software then creates an anonymized version of the training data, later used to retrain the model — resulting in an anonymized version of the model free from any data processing restrictions. This makes the model less prone to inference attacks, as we show in our paper, Anonymizing Machine Learning Models.1
Having tested our technology on publicly available datasets, we’ve obtained promising results. With relatively high values of k and large sets of quasi-identifiers, we created anonymized machine learning models with very little accuracy loss. (See image)
Next, we aim to run our models on real-life data and to see if the results hold. And we plan to extend our method from just tabular data to different kinds of data, including images.
You can try out our open source toolkit on GitHub.
Goldsteen, A., Ezov, G., Shmelkin, R., Moffie, M. & Farkash, A. Anonymizing Machine Learning Models. arXiv:2007.13086 [cs] (2021). ↩