Conference paper

NetZIP: Algorithm/Hardware Co-design of In-network Lossless Compression for Distributed Large Model Training

Abstract

In distributed large model training, the long communication time required to exchange large volumes of gradients and activations among GPUs dominates the training time. To reduce the communication times, lossy or lossless compression of gradients and/or activations can be employed. However, lossy compression of gradients and activations may demand more training iterations to achieve the same model accuracy and cause convergence failure, respectively. Lossless compression, on the other hand, may not reduce the volumes of gradients and activations enough to offset the significant latency associated with compression and decompression on current platforms. To address these challenges, we propose NetZIP, an algorithm/hardware co-design for in-network lossless compression of both gradients and activations. NetZIP consists of two components. (1) NetZIP-algorithm transforms gradients and activations at the bit and value levels to help lightweight standard lossless compression achieve more compression of the gradients and activations. (2) NetZIP-accelerator integrates NetZIP-algorithm with a lightweight lossless compression accelerator within a NIC in a bump-in-the-wire fashion to reduce the compression/decompression latency under the resource constraints. NetZIP-algorithm compresses gradients and activations 40–63 and 43–75 percentage points more, respectively, than heavy standard lossless compression for Llama-3 70B, GPT-3 175B, and Llama-3 405B. NetZIP-accelerator, implemented within FPGA-NICs and connected to commodity servers, provides orders of magnitude loweratency for compression and decompression, respectively, than the lowest latency achieved by the standard lossless compression on current platforms. With greater compression of both gradients and activations, and lower latency for compression and decompression than the standard lossless compression, NetZIP provides 35% lower training time.