About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IJCAI 2018
Conference paper
On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization
Abstract
We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with K = 1. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of 128 GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.