Adaptive topologic optimization for large-scale stream mining
Abstract
Real-time classification and identification of specific features in high-volume data streams are critical for a plethora of applications, including large-scale multimedia analysis, processing, and retrieval. Content of interest is filtered using a collection of binary classifiers that are deployed on distributed resource-constrained infrastructure. In this paper, we focus on selecting the optimal topology (chain) of classifiers, and present algorithms for classifier ordering and configuration, to tradeoff accuracy of feature identification with filtering delay. The order selection is dependent on the data characteristics, system resource constraints as well as the performance and complexity characteristics of each classifier. We first develop centralized algorithms for joint ordering and individual classifier operating point selection. We then propose a decentralized approach and use reinforcement learning methods to design a dynamic routing based order selection strategy. We investigate different learning strategies that lead to rapid convergence, while requiring minimum coordination and message exchange. © 2010 IEEE.