About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Micro
Paper
DNNDaSher: A Compiler Framework for Dataflow Compatible End-to-End Acceleration on IBM AIU
Abstract
Artificial intelligence unit (AIU) is a specialized accelerator card from IBM offering state-of-the-art compute capabilities (hundreds of tera-operations) through dataflow-driven compute arrays attached to a multilevel hierarchy of distributed memory elements. In mapping entire AI models, functional correctness hinges on maintaining dataflow compatibility between producer-consumer operations, i.e., the element organization with which a tensor is produced in memory must match the organization expected by the consumer(s). This paper presents a key component in AIU's compiler stack, DNN Data-Shuffler (DnnDaSher), a systematic framework to analyze such dataflow incompatibilities and invoke an intermediate operation to shuffle tensor elements within and/or across memory elements to resolve the discrepancy. It targets opportunities to eliminate shuffles and increase granularity of memory accesses. Compared to well-optimized baseline implementations of four Convolutional Neural Networks and Transformer benchmarks, DNNDaSher achieves 1.27× -4.12× - (average 2.3× ) end-to-end latency improvement based on measured execution cycles on the AIU.