Matrix approximation (MA) techniques, which can address the data sparsity issue by reducing the dimensionalities of user/item feature vectors, have been extensively adopted in today's recommender systems. However, many existing MA methods, e.g., singular value decomposition (SVD), cannot acquire sparse user or item feature matrices using the popular stochastic gradient descent (SGD)-based training method, which leads to the suboptimal accuracy. In this paper, we propose a matrix approximation method with cumulative penalty, which can improve the classic SGD-based training process by keeping track of the cumulative penalty and applying it to the feature matrix. After applying the penalty to the feature matrices, we can rapidly make many user/item features be zero during the training process and thus obtain a sparse user/item feature matrix to improve the recommendation accuracy. In addition, a parallel training method is adopted in the proposed method, which can ensure faster convergence. Our empirical studies on the MovieLens dataset show that the proposed method can significantly outperform the regularized SVD method in terms of accuracy and scalability.