Nonparametric return distribution approximation for reinforcement learning

Tetsuro Morimurat; Masashi Sugiyama; Hisashi Kashima; Hirotaka Hachiya; Toshiyuki Tanaka

ICML 2010

Conference paper

17 Sep 2010

Nonparametric return distribution approximation for reinforcement learning

Abstract

Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are some-times preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the cumulative return distribution. Then we derive a nonparametric return distribution estimator with particle smooth ing based on this extended Bellman equation. A key aspect of the proposed algorithm is to represent the recursion relation in the extended Bellman equation by a simple replacement procedure of particles associated with a state by using those of the successor state. We show that our algorithm leads to a risk-sensitive R.L paradigm. The usefulness of the proposed approach is demonstrated through numerical experiments. Copyright 2010 by the author(s)/owner(s).

Talk