C.A. Micchelli, W.L. Miranker
Journal of the ACM
We study the use of the reinforcement learning algorithm Q-learning with regression tree function approximation to learn pricing strategies in a competitive marketplace of economic software agents. Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. In the case of a stationary environment with a lookup table representing the Q-function, the learning procedure is guaranteed to converge to an optimal policy. However, utilizing Q-learning in multi-agent systems presents special challenges. The simultaneous adaptation of multiple agents creates a non-stationary environment for each agent, hence there are no theoretical guarantees of convergence or optimality. Also, large multi-agent systems may have state spaces too large to represent with lookup tables, necessitating the use of function approximation.
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Cristina Cornelio, Judy Goldsmith, et al.
JAIR