BILINEAR VALUE NETWORKS

Zhang-Wei Hong; Ge Yang; Pulkit Agrawal

Publication

ICLR 2022

Conference paper

BILINEAR VALUE NETWORKS

ICLR 2022

Abstract

The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, φ(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite, and dexterous manipulation with a Shadow hand.

Date

25 Apr 2022

Publication

ICLR 2022

Authors

IBM-affiliated at time of publication

Topics

Computer Science

Abstract

Date

Publication

Authors

Topics

Share