We consider a large-scale minimization problem (not necessarily convex) with non-smooth separable convex penalty. Problems in this form widely arise in many modern large-scale machine learning and signal processing applications. In this paper, we present a new perspective towards the parallel Block Coordinate Descent (BCD) methods. Specifically we explicitly give a concept of so-called two-layered block variable updating loop for parallel BCD methods in modern computing environment comprised of multiple distributed computing nodes. The outer loop refers to the block variable updating assigned to distributed nodes, and the inner loop involves the updating step inside each node. Each loop allows to adopt either Jacobi or Gauss–Seidel update rule. In particular, we give detailed theoretical convergence analysis to two practical schemes: Jacobi/Gauss–Seidel and Gauss–Seidel/Jacobi that embodies two algorithms respectively. Our new perspective and behind theoretical results help devise parallel BCD algorithms in a principled fashion, which in turn lend them a flexible implementation for BCD methods suited to the parallel computing environment. The effectiveness of the algorithm framework is verified on the benchmark tasks of large-scale ℓ1 regularized sparse logistic regression and non-negative matrix factorization.