Prune, Permute and Expand: Efficient Machine Learning under Non-Client-Aided Homomorphic Encryption

Privacy-preserving neural network (NN) inference solutions under homomorphic encryption (HE) have recently gained significant traction with several solutions that provide different latency-bandwidth trade-offs. Pruning the parameters of a NN model is a well-known approach to improving inference latency. However, pruning methods that are useful in the plaintext context may lend nearly negligible improvement in the HE case. In this work, we propose a novel set of pruning methods that reduce the latency and memory requirement, thus bringing the effectiveness of plaintext pruning methods to HE. We evaluate our methods on an autoencoder architecture on MNIST and show that our best method prunes ~2× more ciphertexts than our adaptation of a state-of-the-art scheme called Hunter, for negligible increase in mean-square reconstruction loss.