Machine Learning (ML) is an attractive application of Non-Volatile Memory (NVM) arrays [1,2]. However, achieving speedup over GPUs will require minimal neuron circuit sharing and thus highly area-efficient peripheral circuitry, so that ML reads and writes are massively parallel and time-multiplexing is minimized . This means that neuron hardware offering full 'software-equivalent' functionality is impractical. We analyze neuron circuit needs for implementing back-propagation in NVM arrays and introduce approximations to reduce design complexity and area. We discuss the interplay between circuits and NVM devices, such as the need for an occasional RESET step, the number of programming pulses to use, and the stochastic nature of NVM conductance change. In all cases we show that by leveraging the resilience of the algorithm to error, we can use practical circuit approaches yet maintain competitive test accuracies on ML benchmarks.