Discovering temporal lagged and inter-dependencies in multivariate time series data is an important task. However, in many real-world applications with big data, such as commercial cloud management or predictive maintenance in manufacturing, such dependencies can be time-variant and non-linear, which makes it more challenging to extract such dependencies through traditional methods like Granger causality or statistical models. In this work, we present a novel deep learning model that uses multiple layers of adapted gated recurrent units (GRUs) for discovering both time lagged behaviors and inter-timeseries dependencies, representing them in the form of directed weighted graphs. Each individual time series is first analyzed by a pair of encoding-decoding GRUs in order to discover the time lagged dependencies and representing its samples as high dimensional vectors. Such vectors collected from all component time series are then analyzed by a decoding network component to discover inter-dependencies across all time series while forecasting their next values in the multivariate time series. Though the discovery of two types of dependencies are separated at two levels of our neural network, they are tightly connected and jointly trained in an end-to-end manner. With this joint training, improvement in learning of one type of dependency immediately impacts the learning process of the other one, leading to the overall highly accurate dependencies discovery. We empirically test our model on synthetic time series data in which the exact form of dependencies are known. We also practically evaluate its performance on two real-world applications, (i) dynamic multivariate performance monitoring data with high volatility from a commercial cloud provider and, (ii) multivariate time series generated by sensors for a manufacturing plant. We show how our approach is capable of capturing these dependency behaviors via intuitive and interpretable dependency graphs and use them to generate forecasting values.