Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

Dionysios Diamantopoulos; Burkhard Ringlein; Mitra Purandare; Gagandeep Singh; Christoph Hagleitner

doi:10.1109/FPL50879.2020.00058

FPL 2020

Conference paper

01 Aug 2020

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

View publication

Abstract

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (?-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance of up to 2.5x and faster convergence of up to 8.1x.

Conference paper