Optimizing the efficiency of deep learning through accelerator virtualization

M. Gschwind; T. Kaldewey; D.K. Tam

doi:10.1147/JRD.2017.2716598

IBM J. Res. Dev

Paper

01 Jul 2017

Optimizing the efficiency of deep learning through accelerator virtualization

View publication

Abstract

Training deep learning models often occupies entire compute clusters, built solely for this purpose, for days or even weeks at a time. There exists a large body of work on approaches for improving training performance, ranging from novel algorithms to full custom hardware accelerators. Offering compute capabilities of multiple teraflops (trillion floating point operations per second), graphics processing units (GPUs) have established themselves as a de-facto standard for accelerating deep learning network training. As systems with up to 16 GPUs - each GPU consuming up to 300 W - become available, efficient usage of these resources becomes imperative. We conduct a detailed analysis of deep learning workloads to characterize their efficiency in making use of GPU acceleration. We found that many deep learning workloads consume only a fraction of GPU resources, and we demonstrate how sharing GPU resources can improve throughput by a factor of 3, effectively turning a 4-GPU commodity cloud system into a high-end 12-GPU supercomputer. Using Watson workloads from three major areas that incorporate deep learning technology - i.e., language classification, visual recognition, and speech recognition - we document the effectiveness and scalability of our approach. We are working toward enabling GPU virtualization not only to reduce cost, but also to accelerate new breakthroughs in deep learning by increasing compute capacity without making further hardware investments.

Paper