Deep Policy Iteration with Integer Programming for Inventory Management

Brian Quanz; Pavithra Harsha; Ashish Jagmohan; Jayant Kalagnanam; Divya Singhvi

INFORMS 2022

Invited talk

16 Oct 2022

Deep Policy Iteration with Integer Programming for Inventory Management

Abstract

Reinforcement learning has led to considerable break-throughs in diverse areas such as robotics, games and many others, but its application in complex real-world decision making problems remains limited. Many problems in OM are characterized by large action spaces and stochastic system dynamics, providing a challenge for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. We demonstrate its effectiveness on a variety of multi-echelon inventory management settings.

Conference paper