Predicting Misconfiguration-Induced Unsuccessful Executions of Jobs in Big Data System
As the complex workload scheduling and resource allocating mechanism in big data system, programmers' configuration error is one of the most typical root causes of unsuccessful termination of jobs, which can result in performance deterioration, availability degradation, resource inefficiency and user unsatisfactory. In this paper, we propose an approach called SD-Predictor, to predict misconfiguration-induced unsuccessful executions of jobs combining static job configurations and dynamic runtime system state before scheduling and execution, so as to save computing resource and scheduling overheads in big data system. We implement and incorporate SD-Predictor with a popular scheduling framework YARN to optimize job scheduling so as to avoid negative impacts by misconfigured jobs. Moreover, we explore correlations between configurations and termination status of jobs and provide some recommendations for configuration optimization. The experiment results show that our approach performs at 78% of precision, 52% of recall and 2% of false positive rate in unsuccessful job prediction, with significantly better recall and false positive rate than related works.