MDP Graph-based Intermediate Model for DRL Training
We consider enterprise optimization problems, which can be modeled or orchestrated by Deep Reinforcement Learning (DRL) algorithms. For more efficient training, we suggest to decompose the original problem and to model each one of the decomposed problems as Markov Decision Process (MDP). For each one of the MDPs, transition probability matrix and costs are estimated from the historical data and domain specifics. Graphs representing these MDPs are then used as a training environment (or, an intermediate model) for the original DRL, which provides rapid training for DRL algorithms.