Edge guided single depth image super resolution
Jun Xie, Rogerio Schmidt Feris, et al.
ICIP 2014
With the recent expansion of Large Language Model (LLM) capabilities, there exists new potential for improving the performance of object detection and classification tasks by taking advantage of the Vision Transformer (ViT) architecture. In this paper, we focus specifically on the problem of object detection and classification on the edge, via a heterogeneous System-on-Chip (SoC). Unique constraints arise in an edge setting, most notably in the amount of available memory - a difficult task given the incredible size of LLMs. Our exploration begins with a traditional Convolutional Neural Network (CNN), running on a small deep learning accelerator, and the issues we faced with this approach on a heterogeneous edge SoC. We transition to a transformer-based architecture, using a ViT adapted for simultaneous object detection and classification running on a Natural Language Processing (NLP) accelerator. In particular, we focus on increasing sparsity in the model to combat the strict memory constraints of the chip and introduction of early-exit mechanisms to minimize end-to-end latency.
Jun Xie, Rogerio Schmidt Feris, et al.
ICIP 2014
Ritendra Datta, Jianying Hu, et al.
ICPR 2008
Eugene H. Ratzlaff
ICDAR 2001
Jonathan H. Connell, Nalini K. Ratha, et al.
ICIP 2002