Extreme Value Monte Carlo Tree Search (Extended Abstract)

Masataro Asai; Stephen Wissow

Publication

SoCS 2024

Workshop paper

Extreme Value Monte Carlo Tree Search (Extended Abstract)

SoCS 2024

Abstract

Monte-Carlo Tree Search (MCTS) combined with Multi Armed Bandit (MAB) has had limited success in domain independent classical planning until recently. Previous work (Wissow and Asai 2023) showed that UCB1, designed for bounded rewards, does not perform well when applied to the cost-to-go estimates of classical planning, which are unbounded in R, then improved the performance by using a Gaussian reward MAB instead. We further sharpen our understanding of ideal bandits for planning tasks by resolving three issues: First, Gaussian MABs under-specify the support of cost-to-go estimates as [−∞, ∞]. Second, Full-Bellman backup that backpropagates max/min of samples lacks theoretical justifications. Third, removing dead-ends lacks justifications in Monte-Carlo backup. We use Extreme Value Theory Type 2 to resolve them at once, propose two bandits (UCB1-Uniform/Power), and apply them to MCTS for classical planning. We formally prove their regret bounds and empirically demonstrate their performance in classical planning.

Date

06 Jun 2024

Publication

SoCS 2024

Authors

IBM-affiliated at time of publication

Topics

AI Planning

Abstract

Date

Publication

Authors

Topics

Share