Wildfire: Approximate synchronization of parameters in distributed deep learning
Abstract
In distributed deep learning approaches, contributions to changes in the parameter values from multiple learners are gathered at periodic intervals and collectively used to update weights associated with the learning network. Gathering these contributions at a centralized location, as in common synchronous parameter server models, causes a bottleneck in two ways. First, the parameter server needs to wait until gradients from all learners have been received, and second, the traffic pattern of the gradients between the learners and the parameter server causes an imbalance in bandwidth utilization in most common networks. In this paper, we introduce a scheme called Wildfire, which communicates weights among subsets of parallel learners, each of which updates its model using only the information received from other learners in the subset. Different subsets of learners communicate at different times, allowing the learning to diffuse through the system. Wildfire reduces the communication overhead in deep learning by allowing learners to communicate directly among themselves rather than through a parameter server, and by limiting the time that learners need to wait before updating their models. We demonstrate the effectiveness of Wildfire on common deep learning benchmarks, using the IBM Rudra deep learning framework.