Sai Zeng, Angran Xiao, et al.
CAD Computer Aided Design
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD(λ), off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD(λ), we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy Vtrace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit.
Sai Zeng, Angran Xiao, et al.
CAD Computer Aided Design
Rajeev Gupta, Shourya Roy, et al.
ICAC 2006
György E. Révész
Theoretical Computer Science
G. Ramalingam
Theoretical Computer Science