About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICS 2009
Conference paper
Load balancing using work-stealing for pipeline parallelismin emerging applications
Abstract
Parallel programming is a requirement in the multi-core era. One of the most promising techniques to make parallel programming available for general users is the use of parallel programming patterns. Functional pipeline parallelism is a well suited pattern for many emerging applications, such as streaming and "Recognition, Mining and Synthesis" (RMS) workloads. In this paper we develop an analytical model for pipeline parallelism and use it to characterize and optimize two of the PARSEC benchmarks which use the parallel pipeline pattern, ferret and dedup. We identify two scalability limitations: load imbalance and I/O bottlenecks. We address load imbalance using two techniques: parallel pipeline stage collapsing and dynamic scheduling. We implemented these optimizations using Pthreads and the Thread-ing Building Blocks (TBB) libraries. We compare predicted and measured performance of all these implementations on a large scale SMP machine and we note that the work-stealing TBB implementation outperforms all other variants.