Often deep learning methods are associated with huge amounts of training data. The deeper the network gets, the larger is the need for training data. A large amount of labeled data helps the network learn about the variations it needs to handle in the prediction stage. It is not easy for everyone to get access to huge amounts of labeled data leaving a few to have the luxury to design very deep networks. In this paper, we propose to flatten the disparity by using the modeling methods to minimize the need for huge amounts of data for training a deep network. Using face recognition as an example, we demonstrate how limited labeled data can be leveraged to obtain near state of the art performance with generalization capability across multiple databases. In addition, we show that the normalization in the overall network can improve the speed and resource requirement for the prediction/inferencing stage.