Placement of replicated tasks for distributed stream processing systems
Abstract
We propose an algorithm for placing tasks of data flows for streaming systems onto servers within a message-oriented middleware where certain tasks can be replicated. Our work is centered on the idea that certain transformations are stateless and can therefore be replicated. Replication in this case can cause workloads to be partitioned among multiple machines, thus enabling message processing to be parallelized and lead to improvements in performance. We propose a guided replication approach for this purpose that iteratively computes the optimal placement of replicas where each subsequent iteration of the algorithm takes as input optimal solutions computed in the previous run. As a result, the system performance is consistently improved, which eventually converges as shown in simulation results. We demonstrate, through simulation experiments with both simple and complex task flow graphs and network topologies that introducing our replication mechanism can lead to improvements in runtime performance. When system resources are scarce, the benefits of applying our replication mechanism are even greater. © 2010 ACM.