Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification
In-memory computing is an emerging computing paradigm enabling deep-learning inference at significantly higher energy-efficiency and reduced latency. The essential idea is mapping the synaptic weights of each layer to one or more in-memory computing (IMC) cores. During inference, these cores perform the associated matrix-vector multiplications in place with O(1) time complexity, obviating the need to move the synaptic weights to additional processing units. Moreover, this architecture enables the execution of these networks in a highly pipelined fashion. However, a key challenge is designing an efficient communication fabric for the IMC cores. In this work, we present one such communication fabric based on a graph topology that is well-suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state-of-the-art CNNs by proving the existence of a homomorphism between the graph representations of these networks and that corresponding to the proposed communication fabric. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present one hardware implementation and show a concrete example of mapping ResNet-32 onto an IMC core array interconnected via the proposed communication fabric.