Graph neural networks (GNNs) have accomplished great success in learning complex systems of relations arising in broad problem settings ranging from e-commerce, social networks to data management. Training GNNs over large-scale graphs poses challenges for constrained compute resources due to the heavy data dependencies between the nodes. Moreover, modern relational data is constantly evolving, which creates an additional layer of learning challenges with respect to the model scalability and expressivity. This paper introduces a simple and efficient learning algorithm for large discrete-time dynamic graphs (DTDGs) - a widely adopted data model for many applications. We particularly tackle two critical challenges: (1) how the model can be efficiently trained on large-scale DTDGs to exploit hardware accelerators with small memory footprint, and (2) how the model can effectively capture the changing dynamics of the graphs. To the best of our knowledge, existing GNNs fail to address both challenges in their models. Hence, we propose a scalable evolving inception GNN, called SEIGN. Specifically, SEIGN features two connected evolving components that adapt the graph model to the arriving snapshot and capture the changing dynamics of the node embeddings, respectively. To scale up the model training, SEIGN introduces a parameter-free message passing step for DTDGs to substantially remove the data dependencies in training. Furthermore, it significantly reduces the training memory footprint and allows us to construct a succinct graph mini-batch without performing neighborhood sampling. We further optimize the proposed evolving strategies by extracting features from neighbors at varying scales to increase the expressive power of the node representations. Our experimental evaluation, on both public benchmark and real industrial datasets, demonstrates that SEIGN achieves 2%-20% improvement in Area Under Curve (AUC) and Average Precision (AP) on the prediction task over the state-of-the-art baselines. SEIGN also supports efficient graph mini-batch training and gains 2-16 times speedup in epoch computation time over the entire DTDGs.