Temporally Abstract Model-Based Reinforcement Learning with Variable-Length Skills

Master's Thesis, Tech. Report, CMU-RI-TR-23-80, December, 2023

View Publication

Abstract

Building agents that can reason and plan at various levels of temporal abstraction has long been a goal of artificial intelligence research. In the context of reinforcement learning (RL), performing temporal abstraction typically implies the use of behavioral primitives that are executed over extended periods of time. These behavioral primitives are sometimes known as options, but following more recent literature, we refer to them as skills. Many existing model-free and all model-based methods for temporal abstraction in RL use skills that execute for a fixed length of time, constraining an agent to a single level of temporal abstraction throughout the entirety of a task. Variable-length skills allow an agent to flexibly change its level of abstraction, which is desirable as many interesting tasks are composed of sub-tasks that can take vastly different amounts of time. Furthermore, state-dependent termination conditions allow a skill to end exactly once its sub-task has been completed and not simply after a predetermined amount of time.

This thesis presents a novel approach to temporal abstraction in model-based RL that allows agents to learn a set of variable-length skills, state-dependent termination conditions, and dynamics on the temporally coarsened timescale induced by those skills. We demonstrate in long-horizon tasks from Datasets for Deep Data-Driven Reinforcement Learning (D4RL) that variable-length skills induce more repeatable state transitions, enabling higher downstream task performance compared to fixed-length skills. Furthermore, our approach is competitive with existing offline RL algorithms while naturally enabling knowledge transfer to novel goals.

BibTeX

@mastersthesis{Wei-2023-139175,
author = {Thomas Wei},
title = {Temporally Abstract Model-Based Reinforcement Learning with Variable-Length Skills},
year = {2023},
month = {December},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-23-80},
keywords = {Temporal Abstraction, Reinforcement Learning, World Modeling, Representation Learning, Unsupervised Learning, Deep Learning, Attention, Transformers},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.