Publication
ACM TACO
Paper

YaConv: Convolution with Low Cache Footprint

Download paper

Abstract

This article introduces YaConv, a new algorithm to compute convolution using GEMM microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple CPU architectures. Previous approaches either create a copy of each image element for each filter element or reload these elements into cache for each GEMM call, leading to redundant instances of the image elements in cache. Instead, YaConv loads each image element once into the cache and maximizes the reuse of these elements. The output image is computed by scattering results of the GEMM microkernel calls to the correct locations in the output image. The main advantage of this new algorithm - which leads to better performance in comparison to the existing im2col approach on several architectures - is a more efficient use of the memory hierarchy. The experimental evaluation on convolutional layers from PyTorch, along with a parameterized study, indicates an average 24% speedup over im2col convolution. Increased performance comes as a result of 3× reduction in L3 cache accesses and 2× fewer branch instructions.

Date

Publication

ACM TACO

Resources

Share