Learning from Failure: Introducing Failure Ratio in RL
Abstract
Deep reinforcement learning combined with Monte-Carlo tree search (MCTS) has demonstrated high performance and thus has been attracting much attention. However, the learning convergence is quite time consuming. In comparison, learning by playing board games with human opponents is more efficient because skills and strategies can be acquired from the failure patterns. We assume that failure patterns contain much meaningful information to expedite the training process, working as prior knowledge for reinforcement learning. To utilize this prior knowledge, we propose an efficient tree search method that introduces the use of a failure ratio that has a high value for failure patterns. We tested our hypothesis by applying this method to the Othello board game. The results show that our method has a higher winning ratio than a state-of-the-art method, especially in the early stage of learning.