Many complex real applications involve the collection of time series data with multiple modalities and of multiple resolutions. For example, in aluminum smelting processes, the recorded process variables typically reflect various aspects of these processes, such as pressure and temperature, and they are often obtained with different time resolutions, such as every 5 minutes and every day. How can we effectively leverage both the multi-modality property and the multi-resolution property of the data for the sake of more accurate prediction of key process indicators (e.g., the cell temperature of the aluminum smelting processes)? Different from existing techniques, which can only model the multi-modality property or the multi-resolution property, in this paper, for the first time, we propose to jointly model the two properties such that the prediction results are consistent across multiple modalities and multiple resolutions. To this end, we construct an optimization framework, which is based on a novel regularizer imposing such consistency. Then, we design an effective and efficient optimization algorithm based on randomized block coordinate descent. Its performance is evaluated on both synthetic and real data sets, outperforming state-of-the-art techniques.