Large Data Flow Graphs in Limited GPU Memory
Abstract
The size of a GPU's memory imposes strict limits both on the complexity of neural networks and the size of the data samples that can be processed. This paper presents methods to efficiently use GPU memory by the TensorFlow1 machine learning framework for processing large data flow graphs of neural networks. The proposed techniques make use of swapping data stored in GPU memory to and from CPU memory, data compression, and serialization of computation. The data flow graph is modified by inserting nodes for data transfer and compression, and by defining control dependencies for serializing the execution of graph nodes. The locations for additional nodes and control dependencies are determined algorithmically by the analysis of the graph's topology and the complexity of operations implemented by the graph's nodes. Our experiments show the capability to process 3D-Unet [1] on 1923-sized images with batch size 4 and ResNet [2] models with more than 7 fold the maximum batch size.