About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ARITH 2019
Conference paper
DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference
Abstract
The resilience of Deep Learning (DL) training and inference workloads to low-precision computations, coupled with the demand for power-and area-efficient hardware accelerators for these workloads, has led to the emergence of 16-bit floating point formats as the precision of choice for DL hardware accelerators. This paper describes our optimized 16-bit format that has 6 exponent bits and 9 fraction bits, derived from a study of the range of values encountered in DL applications. We demonstrate that our format preserves the accuracy of DL networks, and we compare its ease-of-use for DL against IEEE-754 half-precision (5 exponent bits and 10 fraction bits) and bfloat16 (8 exponent bits and 7 fraction bits). Further, our format eliminated sub-normals and simplifies rounding modes and handling of corner cases. This streamlines floating-point unit logic and enables realization of a compact power-efficient computation engine.