Pruning Deep Neural Networks with ell_{0}-constrained Optimization
Abstract
Deep neural networks (DNNs) give state-of-the-art accuracy in many tasks, but they can require large amounts of memory storage, energy consumption, and long inference times. Modern DNNs can have hundreds of million parameters, which make it difficult for DNNs to be deployed in some applications with low-resource environments. Pruning redundant connections without sacrificing accuracy is one of popular approaches to overcome these limitations. We propose two ell-{0}-constrained optimization models for pruning deep neural networks layer-by-layer. The first model is devoted to a general activation function, while the second one is specifically for a ReLU. We introduce an efficient cutting plane algorithm to solve the latter to optimality. Our experiments show that the proposed approach achieves competitive compression rates over several state-of-the-art baseline methods.