Network scheduling aware task placement in datacenters
To improve the performance of data-intensive applications, existing datacenter schedulers optimize either the placement of tasks or the scheduling of network flows. The task scheduler strives to place tasks close to their input data (i.e., maximize data locality) to minimize network traffic, while assuming fair sharing of the network. The network scheduler strives to finish flows as quickly as possible based on their sources and destinations determined by the task scheduler, while the scheduling is based on flow properties (e.g., size, deadline, and correlation) and not bound to fair sharing. Inconsistent assumptions of the two schedulers can compromise the overall application performance. In this paper, we propose NEAT, a task scheduling framework that leverages information from the underlying network scheduler to make task placement decisions. The core of NEAT is a task completion time predictor that estimates the completion tijme of a task under given network condition and a given network scheduling policy. NEAT leverages the predicted task completion times to minimize the average completion time of active tasks. Evaluation using ns2 simulations and real-testbed shows that NEAT improves application performance by up to 3.7x for the suboptimal network scheduling policies and up to 30% for the optimal network scheduling policy.