About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IC2EW 2016
Conference paper
Dynamic block sizing for data stream processing systems
Abstract
Real-time processing of big data is becoming one of the core operations in various areas, such as social networks and anomaly detection. Thanks to the rich information of the data, multiple queries can be executed to analyse the data and discover a variety of business values. It is very typical that a cluster infrastructure running for example a Spark Streaming data stream processing system would execute multiple queries simultaneously. To enable multiple queries being answered from the same data concurrently, it is important to effectively allocate the CPU-cores of the underlying infrastructure to the queries, meanwhile adhering to the latency constraints of the individual queries. In this paper, we consider the problem of allocating CPU-cores in a Spark Streaming infrastructure in the context of two types of queries, namely primary and optional, that are associated with high-and low-priority analysis, respectively. We develop a controller, iBLOC, that adjusts the block sizes of streaming jobs on the fly and the parallelism level of jobs, according to the input data rates and the query priorities. Our evaluation shows that we can achieve significant CPU-core savings from the primary query type such that multiple queries can run together without impairing their latency constraints, in comparison to a static block-sizing scheme.