Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Nhuong Nguyen; Toan Nguyen; Phuong Ha Nguyen; Lam Nguyen; Marten van Dijk

INFORMS 2021

Talk

24 Oct 2021

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

View publication

Abstract

We consider big data analysis where training data is distributed among local data sets in a heterogeneous way - and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost. We improve state-of-the-art literature and show O(K^{0.5}) communication rounds for heterogeneous data for strongly convex problems, where K is the total number of gradient computations across all local compute nodes. For our scheme, we prove a tight and novel non-trivial convergence analysis for strongly convex problems for heterogeneous data which does not use the bounded gradient assumption as seen in many existing publications.

Talk