Impact of System Resources on Performance of Deep Neural Network
The training of deep neural networks (DNNs) require intensive resources both for computation and for memory/storage performance. It is important to enable rapid development, experimentation, and testing of DNNs by improving the performance of these codes. This requires understanding what system resources are exercised by deep learning codes, to what degree the utilization of different resources is impacted by changes in the compute intensity or size of data being processed by the neural network, and the nature of the dependencies between different resource bottlenecks. For this purpose, we are performing an extensive empirical evaluation by varying several execution parameters and running hundreds of experiments with different configurations of DNN training jobs. The goal is to gain a robust understanding of how to tailor system resources and training hyperparameters to the needs of a given deep learning job by accounting for both the DNN model and the dataset.