Wrapping Up Offline RL as part of AutoMLPipeline Workflow

Paulito Palmes

JuliaCon 2023

Talk

25 Jul 2023

Wrapping Up Offline RL as part of AutoMLPipeline Workflow

View publication

Abstract

Unlike in Online RL where agents need to interact with real environment, Offline RL works similar to a typical machine learning workflow. Given a dataset, Offline RL processes data extracting state, action, reward, and terminal columns to optimize the policy Q. By wrapping up offline RL into the AutoMLPipeline workflow, it becomes trivial to search for the optimal preprocessing elements and their combinations to improve Offline RL optimal policy using symbolic workflow manipulation.

As part of AutoMLPipeline workflow, it becomes trivial to search which preprocessing elements and their combinations provide the best policy Q by cross-validation where the dataset is split into training and testing several times to get the average accumulated discounted rewards (return) of a given policy Q. This talk will demonstrate how to setup the Offline RL pipeline to preprocess the dataset and learn the optimal policy Q and incorporate some parallel search strategy to get the optimal workflow.

Paper