Decentralized optimization with multiple networked clients/learners has advanced machine learning significantly over the past few years. When data distributions at different nodes/locations are heterogeneous, consensus-based decentralized algorithms ignore distinctive features of local data samples. In this paper, we propose a decentralized client adaptation strategy for personalized learning by taking local client data structures into account. It turns out that optimizing the model parameters can be formulated as a decentralized bilevel programming problem. Motivated by this application, we propose a stochastic primal-dual framework for solving decentralized bilevel nonconvex problems and show that the devised algorithm achieves the Karush–Kuhn–Tucker (KKT) points for this class of problems at a rate of O(1/nT−−−√), where n denotes the number of total learners and T the total number of iterations. Multiple experiments show the superiority of our proposed method compared to state-of-the-art methods in terms of both training speed and testing accuracy for decentralized learning problems on real datasets.