In many real applications such as virtual metrology in semiconductor manufacturing, face recognition, and gait recognition in computer vision, the input data is naturally expressed as tensors or multi-dimensional arrays. Furthermore, in addition to the known label information, domain knowledge can often be obtained from various sources, e.g., multiple domain experts. To address such problems, in this paper, we propose a general optimization framework for dealing with tensor inputs while taking into consideration domain knowledge. To be specific, our framework is based on a linear model, and we obtain the weight tensor in a hierarchical way—first approximate it by a low-rank tensor, and then estimate the low-rank approximation using the domain knowledge from various sources. This is motivated by wafer quality prediction in semiconductor manufacturing. We also propose an effective algorithm named H-MOTE for solving this framework, which is guaranteed to converge. For each iteration, the time complexity of H-MOTE is linear with respect to the number of examples as well as the size of the weight tensor. Therefore, H-MOTE is scalable to large-scale problems. Experimental results show that H-MOTE outperforms state-of-the-art techniques on both synthetic and real data sets.