Shubham Jain, Hsinyu Tsai, et al.
IEEE Transactions on VLSI Systems
Artificial intelligence unit (AIU) is a specialized accelerator card from IBM offering state-of-the-art compute capabilities (hundreds of tera-operations) through dataflow-driven compute arrays attached to a multilevel hierarchy of distributed memory elements. In mapping entire AI models, functional correctness hinges on maintaining dataflow compatibility between producer-consumer operations, i.e., the element organization with which a tensor is produced in memory must match the organization expected by the consumer(s). This paper presents a key component in AIU's compiler stack, DNN Data-Shuffler (DnnDaSher), a systematic framework to analyze such dataflow incompatibilities and invoke an intermediate operation to shuffle tensor elements within and/or across memory elements to resolve the discrepancy. It targets opportunities to eliminate shuffles and increase granularity of memory accesses. Compared to well-optimized baseline implementations of four Convolutional Neural Networks and Transformer benchmarks, DNNDaSher achieves 1.27× -4.12× - (average 2.3× ) end-to-end latency improvement based on measured execution cycles on the AIU.
Shubham Jain, Hsinyu Tsai, et al.
IEEE Transactions on VLSI Systems
Sarada Krithivasan, Sanchari Sen, et al.
Frontiers in Neuroscience
Monodeep Kar, Joel Silberman, et al.
IEEE Journal of Solid-State Circuits
Sarada Krithivasan, Sanchari Sen, et al.
DAC 2022