NN-EMD: Efficiently Training Neural Networks Using Encrypted Multi-Sourced Datasets

Runhua Xu; James Joshi; Chao Li

doi:10.1109/TDSC.2021.3074439

IEEE TDSC

Paper

31 Dec 2021

NN-EMD: Efficiently Training Neural Networks Using Encrypted Multi-Sourced Datasets

View publication

Abstract

Training complex neural network models using third-party cloud-based infrastructure among multiple data sources is a promising approach among existing machine learning solutions. However, privacy concerns of large-scale data collections and recent regulations have restricted the availability and use of privacy sensitive data in the third-party infrastructure. To address such privacy issues, a promising emerging approach is to train a neural network model over an encrypted dataset. Specifically, the model training process can be outsourced to a third party such as a cloud service that is backed by significant computing power, while the encrypted training data keeps the data confidential from the third party. Compared to training a traditional machine learning model over encrypted data, however, it is extremely challenging to train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as using homomorphic encryption, is inefficient. Further, for enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over encrypted multiple datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. We show that compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting.

Paper