Generalization is a fundamental problem in machine learning. For overparameterized deep neural network models, there are many solutions that can fit the training data equally well. The key question is which solution has a better generalization performance measured by test loss (error). Here we report the discovery of exact duality relations between changes in activities and changes in weights in any fully connected layer in feed-forward neural networks. By using the activity–weight duality relation, we decompose the generalization loss into contributions from different directions in weight space. Our analysis reveals that two key factors, sharpness of the loss landscape and size of the solution, act together to determine generalization. In general, flatter and smaller solutions have better generalization. By using the generalization loss decomposition, we show how existing learning algorithms and regularization schemes affect generalization by controlling one or both factors. Furthermore, by applying our analysis framework to evaluate different algorithms for realistic large neural network models in the multi-learner setting, we find that the decentralized algorithms have better generalization performance as they introduce additional landscape-dependent noise that leads to flatter solutions without changing their sizes.