The Tucker decomposition generalizes singular value decomposition (SVD) to high dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) is a variant that imposes the constraint that the entries of the core and the factor matrices must be non-negative. Generalizing a classical algorithm from the domain of non-negative matrix factorization, Mørup et al.  designed a procedure for NTD via the multiplicative weight update paradigm. Based on the above procedure, we present a distributed implementation of NTD for sparse tensors. We develop three algorithms for efficiently executing the procedure. The first is a baseline algorithm that adapts strategies from prior work on the Tucker decomposition. The other two are improved algorithms that are optimized based on properties unique to the NTD procedure. We present an experimental evaluation on a benchmark of large real-life tensors on a system with 32 to 512 MPI ranks. The study shows that the optimized algorithms outperform the baseline by a factor of up to 6x in execution time. The distributed implementation scales well with speedup up to 12x (as against an ideal factor of 16x).