Resource elasticity for large-scale machine learning
Declarative large-scale machine learning (ML) aims at exible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks. State-of-the-art compilers in this context are very sensitive to memory constraints of the master process and MR cluster configuration. Different memory configurations can lead to significant performance differences. Interestingly, resource negotiation frameworks like YARN allow us to explicitly request preferred resources including memory. This capability enables automatic resource elasticity, which is not just important for performance but also removes the need for a static cluster configuration, which is always a compromise in multi-tenancy environments. In this paper, we introduce a simple and robust approach to automatic resource elasticity for large-scale ML. This includes (1) a resource optimizer to find near-optimal memory configurations for a given ML program, and (2) dynamic plan migration to adapt memory configurations during runtime. These techniques adapt resources according to data, program, and cluster characteristics. Our experiments demonstrate significant improvements up to 21x without unnecessary over-provisioning and low optimization overhead.