MDP Graph-based Intermediate Model for DRL Training

Alexander Zadorojniy; Segev Wasserkrug

INFORMS 2020

Talk

07 Sep 2020

MDP Graph-based Intermediate Model for DRL Training

View publication

Abstract

We consider enterprise optimization problems, which can be modeled or orchestrated by Deep Reinforcement Learning (DRL) algorithms. For more efficient training, we suggest to decompose the original problem and to model each one of the decomposed problems as Markov Decision Process (MDP). For each one of the MDPs, transition probability matrix and costs are estimated from the historical data and domain specifics. Graphs representing these MDPs are then used as a training environment (or, an intermediate model) for the original DRL, which provides rapid training for DRL algorithms.

Talk