By performing computation at the location of data, non-Von Neumann (VN) computing should provide power and speed benefits over conventional (e.g., VN-based) approaches to data-centric workloads such as deep learning. For the on-chip training of large-scale deep neural networks using nonvolatile memory (NVM) based synapses, success will require performance levels (e.g., deep neural network classification accuracies) that are competitive with conventional approaches despite the inherent imperfections of such NVM devices, and will also require massively parallel yet low-power read and write access. In this paper, we focus on the latter requirement, and outline the engineering tradeoffs in performing parallel reads and writes to large arrays of NVM devices to implement this acceleration through what is, at least locally, analog computing. We address how the circuit requirements for this new neuromorphic computing approach are somewhat reminiscent of, yet significantly different from, the well-known requirements found in conventional memory applications. We discuss tradeoffs that can influence both the effective acceleration factor ("speed") and power requirements of such on-chip learning accelerators.