Multimodal analysis that incorporates time series and textual corpora as input data sources is becoming a promising approach, especially in the financial industry. However, the main focus of such analysis has been on achieving high prediction accuracy rather than on understanding the association between the two data modalities. In this work, we address the important problem of automatically discovering a small set of top news articles associated with a given time series. Towards this goal, we propose a novel multi-modal neural model called MSIN that jointly learns both the numerical time series and the categorical text articles in order to unearth the correlation between them. Through multiple steps of data interrelation between the two data modalities, MSIN learns to focus on a small subset of text articles that best align with the current performance in the time series. This succinct set is timely discovered and presented as recommended documents for the given time series, offering MSIN as an automated information filtering system. We empirically evaluate its performance on discovering daily top relevant news articles collected from Thomson Reuters for two given stock time series, AAPL and GOOG, over a period of seven consecutive years. The experimental results demonstrate MSIN achieves up to 84.9% and 87.2% respectively in recalling the ground truth articles, superior to SOTA algorithms that rely on conventional attention mechanisms in deep learning.