Bhuvan Urgaonkar, Giovanni Pacifici, et al.
SIGMETRICS 2005
Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2) straggler tasks can increase end-to-end latency by 8 times more than the median task on a production cluster [1].We propose a feedback-control mechanism that allows frameworks to adaptively load-balance workloads across tasks according to their processing speeds. The task runtimes are thus equalized, lowering end-to-end tail latency. Further, this reduces load on machines that have transient resource bottlenecks, thus resolving the bottlenecks and preventing them from having an enduring impact on task runtimes.
Bhuvan Urgaonkar, Giovanni Pacifici, et al.
SIGMETRICS 2005
Merve Unuvar, Yurdaer Doganata, et al.
CLOUD 2014
Chien-An Lai, Asser Tantawi, et al.
CLOUD 2016
Dinesh Kumar, Asser Tantawi, et al.
MASCOTS 2009