Amit Anil Nanavati, Nitendra Rajput, et al.
MobileHCI 2011
GPUs are becoming a scarce resource in high demand, as many teams build and train increasingly advanced artificial intelligence workloads. As GPUs become more performant, they consume more energy, with NVIDIA’s latest A100 and H100 graphics cards consuming upwards of 700W of power. This paper characterizes how best to scale down a large modern GPU to suite workloads that cannot fully exploit an entire GPU. The paper measures six workloads from 14 million parameter image classifiers to 1.5 billion parameter large language models, finding that partitioned GPUs with a mix of small, medium, and large partitions can deliver up to 33% less energy demand and 9% higher training throughput from a single GPU. We found high potential in fine-tuning existing models, with 55% faster training at 42% less energy. Our results suggest that multiplexing small workloads onto spatially partitioned GPUs can improve the efficiency of a single GPU while giving clients access to smaller slices of the latest GPUs that better suits their job’s demands.
Amit Anil Nanavati, Nitendra Rajput, et al.
MobileHCI 2011
Amol Thakkar, Andrea Antonia Byekwaso, et al.
ACS Fall 2022
Dimitrios Christofidellis, Giorgio Giannone, et al.
MRS Spring Meeting 2023
Carla F. Griggio, Mayra D. Barrera Machuca, et al.
CSCW 2024