Predicting cloud performance for HPC applications before deployment

Giovanni Mariani; Andreea Anghel; Rik Jongerius; Gero Dittmann

doi:10.1016/j.future.2017.10.048

Future Generation Computer Systems

Paper

01 Oct 2018

Predicting cloud performance for HPC applications before deployment

View publication

Abstract

To reduce the capital investment required to acquire and maintain a high performance computing cluster, today many HPC users are moving to cloud. When deploying an application in the cloud, the users may (a) fail to understand the interactions of the application with the software layers implementing the cloud system, (b) be unaware of some hardware details of the cloud system, and (c) fail to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the users to select suboptimal cloud configurations in terms of cost or performance. In this work we propose a machine-learning methodology to support the user in the selection of the best cloud configuration to run the target workload before deploying it in the cloud. This enables the user to decide if and what to buy before facing the cost of porting and analyzing the application in the cloud. We couple a cloud-performance-prediction model (CP) on the cloud-provider side with a hardware-independent profile-prediction model (PP) on the user-side. PP captures the application-specific scaling behavior. The user profiles the target application while processing small datasets on small machines she (or he) owns, and applies machine learning to generate PP to predict the profiles for larger datasets to be processed in the cloud. CP is generated by the cloud provider to learn the relationships between the hardware-independent profile and cloud performance starting from the observations gathered by executing a set of training applications on a set of training cloud configurations. Since the profile data in use is hardware-independent the user and the provider can generate the prediction models independently possibly on heterogeneous machines. We apply the prediction models to Fortran-MPI benchmarks. The resulting relative error is below 12% for CP and 30% for PP. The optimal Pareto front of cloud configurations finally found when maximizing performance and minimizing execution cost on the prediction models is at most 25% away from the actual optimal solutions.

Paper