Lynceus: Cost-efficient tuning and provisioning of data analytic jobs

Maria Casimiro; Diego Didona; Paolo Romano; Luís Rodrigues; Willy Zwaenepoel; David Garlan

doi:10.1109/ICDCS47774.2020.00047

ICDCS 2020

Conference paper

01 Nov 2020

Lynceus: Cost-efficient tuning and provisioning of data analytic jobs

View publication

Abstract

Modern data analytic and machine learning jobs find in the cloud a natural deployment platform to satisfy their notoriously large resource requirements. Yet, to achieve cost efficiency, it is crucial to identify a deployment configuration that satisfies user-defined QoS constraints (e.g., on execution time), while avoiding unnecessary over-provisioning. This paper introduces Lynceus, a new approach for the optimization of cloud-based data analytic jobs that improves over state-of-the-art approaches by enabling significant cost savings both in terms of the final recommended configuration and of the optimization process used to recommend configurations. Unlike existing solutions, Lynceus optimizes in a joint fashion both the cloud-related (i.e., which and how many machines to provision) and the application-level (e.g. the hyper-parameters of a machine learning algorithm) parameters. This allows for a reduction of the cost of recommended configurations by up to 3.7× at the 90-th percentile with respect to existing approaches, which treat the optimization of cloud-related and application-level parameters as two independent problems. Further, Lynceus reduces the cost of the optimization process (i.e., the cloud cost incurred for testing configurations) by up to 11×. Such an improvement is achieved thanks to two mechanisms: i) a timeout approach which allows to abort the exploration of configurations that are deemed suboptimal, while still extracting useful information to guide future explorations and to improve its predictive model — differently from recent works, which either incur the full cost for testing suboptimal configurations or are unable to extract any knowledge from aborted runs; ii) a long-sighted and budget-aware technique that determines which configurations to test by predicting the long-term impact of each exploration — unlike state-of-the-art approaches for the optimization of cloud jobs, which adopt greedy optimization methods.

Conference paper