Recent years have seen marked developments in deep neural networks (DNNs) stemming from advances in hardware and increasingly large datasets. DNNs are now routinely used in domains including computer vision and language processing. At their core, DNNs rely heavily on multiply-accumulate (MAC) operations making them well-suited for the highly parallel computational abilities of GPUs. GPUs, however, are von Neumann in architecture and physically separate memory blocks from computational blocks. This exacts an unavoidable time and energy cost associated with data transport known as the von-Neumann bottleneck. While incremental advances in digital hardware accelerators mitigating the von Neumann bottleneck will continue, we explore the potentially game-changing advantages of non-von Neumann architectures that perform MAC operations within the memory. This is achieved using a crossbar array of analog memory as shown in Fig. 1, which serves as the basis of our analog DNN hardware accelerators, and is amenable to both DNN training and forward inference , . Recent work from our group has shown analog DNN hardware accelerators capable of 280x speedup in per area throughput while also providing 100x increase in energy efficiency over state-of-the-art GPUs .