HelmGemm: Managing GPUs and FPGAs for transprecision GEMM workloads in containerized environments
Major global vendors, including Google, IBM, Facebook, and Amazon, have recently provided containerized system configurations as a competitive alternative to traditional hypervisor-based virtualization thanks to their rapid deployment, efficiency, compatibility, and maintainability. Similar to traditional cloud environments, energy consumption still constitutes the lion's share of overall infrastructure operating expenses. Most public and private cloud providers have coupled their datacenters with accelerators such as GPUs and FPGAs to improve the energy efficiency of their systems. However, it remains a challenging task to manage such heterogeneous systems and share resources in multi-tenant environments while improving energy efficiency. To address this need, we propose HelmGemm, a system-level component to support energy-efficient computing on CPU-GPU-FPGA heterogeneous architectures for container services. HelmGemm is application-specific to workloads featuring the BLAS3 GEMM routine and allows precision selection across the computational progress, i.e. a technique that recently gave rise to the term 'transprecision computing'. By evaluating HelmGemm on a POWER9 system with 4×V100 GPUs and 2×9V3 FPGAs, we succeeded in improving the average energy efficiency by up to 2.3× in inter-scale containerized configurations across three representative GEMM-based cloud applications in the field of machine learning, i.e. for speech recognition, language modeling, and deep neural networks.