Jovan Stojkovic, Tianyin Xu, et al.
HPCA 2023
Effective benchmarking is required to optimize GPU resource efficiency and enhance performance for AI workloads. This talk provides a practical guide on setting up, configuring, and running various GPU and AI workload benchmarks in Kubernetes.
The talk covers benchmarks for a range of use cases, including model serving, model training and GPU stress testing, using tools like NVIDIA Triton Inference Server, fmperf: an open-source tool for benchmarking LLM serving performance, MLPerf: an open benchmark suite to compare the performance of machine learning systems, GPUStressTest, gpu-burn, and cuda benchmark. The talk will also introduce GPU monitoring and load generation tools.
Through step-by-step demonstrations, attendees will gain practical experience using benchmark tools. They will learn how to effectively run benchmarks on GPUs in Kubernetes and leverage existing tools to fine-tune and optimize GPU resource and workload management for improved performance and resource efficiency.
Jovan Stojkovic, Tianyin Xu, et al.
HPCA 2023
Burkhard Ringlein, Thomas Parnell, et al.
PyTorch Conference 2025
Gen Tsutsui, Seunghyun Song, et al.
IEDM 2022
Nicholas Nordlund, Vassilis Vassiliadis, et al.
CLOUD 2021