Efficient Exploration with Failure Ratio for Deep Reinforcement Learning
A combination of Monte Carlo tree search (MCTS) and deep reinforcement learning has demonstrated incredibly high performance and has been attracting much attention lately. However, the convergence of learning is very time-consuming. On the other hand, when we want to acquire skills efficiently, it is important to learn from failure, locating its cause, and modifying the strategy accordingly. Using the analogy of this context, we propose an efficient tree search method by introducing a failure ratio that has high values in important phases. We applied our method to Othello board game. We conducted experiments and showed that our method has a higher winning ratio than the state-of-the-art method, especially in the early stage of learning.