Real-Time Style Transfer with Efficient Vision Transformers
Style Transfer aims at transferring the artistic style from a reference image to a content image. While Deep Learning (DL) has achieved state-of-The-Art Style Transfer performance using Convolutional Neural Networks (CNN), its real-Time application still requires powerful hardware such as GPU-Accelerated systems. This paper leverages transformer-based models to accelerate real-Time Style Transfer on mobile and embedded hardware platforms. We designed a Neural Architecture Search (NAS) algorithm dedicated to vision transformers to find the best set of architecture hyperparameters that maximizes the Style Transfer performance, expressed in Frame/seconds (FPS). Our approach has been evaluated and validated on the Xiaomi Redmi 7 mobile phone and the Raspberry Pi 3 platform. Experimental evaluation shows that our approach allows to achieve a 3.5x and 2.1x speedups compared to CNN-based Style Transfer models and Transformer-based models respectively1.