Deep Policy Iteration with Integer Programming for Inventory Management
Reinforcement learning has led to considerable break-throughs in diverse areas such as robotics, games and many others, but its application in complex real-world decision making problems remains limited. Many problems in OM are characterized by large action spaces and stochastic system dynamics, providing a challenge for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. We demonstrate its effectiveness on a variety of multi-echelon inventory management settings.