Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, butat the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, im-age compression, denoising, and classification, with autoencoders, multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs).We implement and deploy our networks atop a framework called HElayers, which shows a 10–35% improvement in inference speed and a 17–35% decrease in memory requirement over the unpruned network, corresponding to 33–65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.