Bus arrival time prediction intends to improve the level of the services provided by transportation agencies. Intuitively, many stochastic factors affect the predictability of the arrival time, e.g., weather and local events. Moreover, the arrival time prediction for a current station is closely correlated with that of multiple passed stations. Motivated by the observations above, this paper proposes to exploit the long-range dependencies among the multiple time steps for bus arrival prediction via recurrent neural network (RNN). Concretely, RNN with long short-term memory block is used to 'correct' the prediction for a station by the correlated multiple passed stations. During the correlation among multiple stations, one-hot coding is introduced to fuse heterogeneous information into a unified vector space. Therefore, the proposed framework leverages the dynamic measurements (i.e., historical trajectory data) and the static observations (i.e., statistics of the infrastructure) for bus arrival time prediction. In order to fairly compare with the state-of-the-art methods, to the best of our knowledge, we have released the largest data set for this task. The experimental results demonstrate the superior performances of our approach on this data set.