On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization

Fan Zhou; Guojing Cong

doi:10.24963/ijcai.2018/447

IJCAI 2018

Conference paper

13 Jul 2018

On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization

View publication

Abstract

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with K = 1. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of 128 GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

Paper