True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization
With the ever growing popularity of deep learning, the tremendous complexity of deep neural networks is becoming problematic when one considers inference on resource constrained platforms. Binary networks have emerged as a potential solution, however, they exhibit a fundamentallimi-tation in realizing gradient-based learning as their activations are non-differentiable. Current work has so far relied on approximating gradients in order to use the back-propagation algorithm via the straight through estimator (STE). Such approximations harm the quality of the training procedure causing a noticeable gap in accuracy between binary neural networks and their full precision baselines. We present a novel method to train binary activated neural networks using true gradient-based learning. Our idea is motivated by the similarities between clipping and binary activation functions. We show that our method has minimal accuracy degradation with respect to the full precision baseline. Finally, we test our method on three benchmarking datasets: MNIST, CIFAR-10, and SVHN. For each benchmark, we show that continuous binarization using true gradient-based learning achieves an accuracy within 1.5% of the floating-point baseline, as compared to accuracy drops as high as 6% when training the same binary activated network using the STE.