Math Programming Based Reinforcement Learning For Multi-echelon Supply Chain Management

Pavithra Harsha; Ashish Jagmohan; Jayant Kalagnanam; Brian Quanz; Divya Singhvi

INFORMS 2021

Talk

24 Oct 2021

Math Programming Based Reinforcement Learning For Multi-echelon Supply Chain Management

View publication

Abstract

Reinforcement Learning has lead to considerable break-throughs in diverse areas such as robotics, games and others. But the application to RL in complex decision making problems remains limited. Many problems in Operations Management are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a value iteration method that uses techniques from IP, SAA and optimal discretization of continuous random variables. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings.

Short paper