About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MSOM 2022
Talk
Deep policy iteration with integer programming for inventory management
Abstract
In this work, we discuss Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. We numerically benchmark the algorithm in complex supply chain settings where optimal solution is intractable and show its performs comparable to, and sometimes better than, state-of-the-art RL and commonly used inventory management benchmarks.