Reducing tail latencies in micro-batch streaming workloads

Faria Kalim; Asser Tantawi; Stefania Costache; Alaa Youssef

doi:10.1145/3127479.3134433

SoCC 2017

Conference paper

24 Sep 2017

Reducing tail latencies in micro-batch streaming workloads

View publication

Abstract

Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2) straggler tasks can increase end-to-end latency by 8 times more than the median task on a production cluster [1].We propose a feedback-control mechanism that allows frameworks to adaptively load-balance workloads across tasks according to their processing speeds. The task runtimes are thus equalized, lowering end-to-end tail latency. Further, this reduces load on machines that have transient resource bottlenecks, thus resolving the bottlenecks and preventing them from having an enduring impact on task runtimes.

Conference paper