About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ARES 2007
Conference paper
Failure recovery in cooperative data stream analysis
Abstract
We present a failure recovery framework for System S, a large-scale stream data analysis environment. It is intended to support multiple sites, which have their own local administration and goals. However, it is beneficial for these sites to cooperate with each other, especially in the presence of various failures. Our ultimate goal is to support automatic, timely failure recovery through cooperation among sites. We identify the unique challenges in the context of System S and present our initial design work. In particular, we consider a backup selection problem, specifying where to recover failed jobs, which we formulate as an optimization problem. We present an approximation algorithm together with empirical results obtained through simulations. Our numerical evaluations show that the proposed approximation algorithm is very efficient and effective compared to the optimal solutions. It exhibits a promising empirical performance ratio that is close to the theoretical limit of polynomial approximations of such a problem. © 2007 IEEE.