Reinforcement Learning has lead to considerable break-throughs in diverse areas such as robotics, games and others. But the application to RL in complex decision making problems remains limited. Many problems in Operations Management are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a value iteration method that uses techniques from IP, SAA and optimal discretization of continuous random variables. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings.