Financial time series analysis plays a central role in optimizing investment decision and hedging market risks. This is a challenging task as the problems are always accompanied by dual-level (i.e, data-level and task-level) heterogeneity. For instance, in stock price forecasting, a successful portfolio with bounded risks usually consists of a large number of stocks from diverse domains (e.g, utility, information technology, healthcare, etc.), and forecasting stocks in each domain can be treated as one task; within a portfolio, each stock is characterized by temporal data collected from multiple modalities (e.g, finance, weather, and news), which corresponds to the data-level heterogeneity. Furthermore, the finance industry follows highly regulated processes, which require prediction models to be interpretable, and the output results to meet compliance. Therefore, a natural research question is how to build a model that can achieve satisfactory performance on such multi-modality multi-task learning problems, while being able to provide comprehensive explanations for the end users. To answer this question, in this paper, we propose a generic time series forecasting framework named Dandelion, which leverages the consistency of multiple modalities and explores the relatedness of multiple tasks using a deep neural network. In addition, to ensure the interpretability of the framework, we integrate a novel trinity attention mechanism, which allows the end users to investigate the variable importance over three dimensions (i.e, tasks, modality and time). Extensive empirical results demonstrate that Dandelion achieves superior performance for financial market prediction across 396 stocks from 4 different domains over the past 15 years. In particular, two interesting case studies show the efficacy of Dandelion in terms of its profitability performance, and the interpretability of output results to end users.