Learning Task Decomposition with Order-Memory Policy Network
Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. To simulate this process, we introduce Ordered Memory Policy Network (OMPN) to discover task decomposition by imitation learning from demonstration. OMPN has an explicit inductive bias to model a hierarchy of sub-tasks. Experiments on Craft world and Dial demonstrate that our model can more accurately recover the task boundaries with behavior cloning under both unsupervised and weakly supervised setting than previous methods. OMPN can also be directly applied to partially observable environments and still achieve high performance. Our visualization further confirms the intuition that OMPN can learn to expand the memory at higher levels when one subtask is close to completion.