Structured Representations for Behaviors of Autonomous Robots

Aaron Roth
Master's Thesis, Tech. Report, CMU-RI-TR-19-50, July, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

As an increasing number of robot platforms and robot tasks become available and feasible, we aim to improve methods for modular robot task definition and execution. Regardless of the source (human or learned), autonomous robot behavior can be captured in many ways: as code, as modules of code, in an unstructured form such as a neural net, or in one of several more structured formats such as a graph, table, or tree.

This thesis explores structured representations that are simultaneously understandable by humans and executable by robots. Enforcing a certain structure on policies can streamline the development of code, enable task transfer, facilitate task instruction through other modalities such as interactive dialogue, and cause autonomously learned policies to be interpretable. This effort is further motivated by the growing desire in the field of reinforcement learning (and machine learning in general) to move from black-box models toward “interpretable AI.”

We present Transferable Augmented Instruction Graphs (TAIGs), a platform-independent task representation and execution framework based on the functional composition of robot behavioral and perceptual primitives. We provide an overview of the previously introduced Instruction Graphs and contribute the Augmented Instruction Graphs with the ability to use memory and represent negated conditions, halt conditions, and nested graphs in order to capture complex task policies. We further define representation and execution management to reference a library of primitives allowing policies to be transferred between different robot platforms. We demonstrate the use of TAIGs by applying them to a concrete matching game task example using two autonomous robots: Pepper and Baxter. We further demonstrate TAIG in the context of performing the RoboCup@Home General Purpose Service Robot challenge task.

Recognizing the value of having a means of constructing a graph aside from programming, we introduce Interactive-TAIG, a framework for enabling construction of TAIGs through an interactive dialogue. We demonstrate this construction technique with a simple search-and-deliver task.

We discuss two types of structured representations for policies learned autonomously via reinforcement learning. The first is a decision tree structure, where we extend the partial-Conservative Q-Improvement (pCQI) method into two successive methods: Conservative Q-Improvement and Conservative Q-Improvement 2. The class of decision tree policies learned via reinforcement learning that we discuss in this thesis are the class of policies that consists of a decision tree over the state space, which requires fewer parameters to express than traditional policy representations. In contrast to many existing methods for creating decision tree policies via reinforcement learning, which focus on accurately representing an action-value function during training, our extension of the pCQI algorithm only increases tree size when the estimated discounted future reward of the overall policy would increase by a sufficient amount. Through evaluation in simulated environment, we show that its performance is comparable or superior to non-CQI-based methods. Additionally, we discuss tuning parameters to control the tradeoff between optimizing for smaller tree size or for overall reward.

Secondly, we introduce a method for learning a TAIG using reinforcement learning. The resulting TAIG includes WHILE loops in the structure, corresponding to subtasks of the task. This method is a means of a robot autonomously learning a policy that then has all the benefits of a TAIG.

This thesis includes the release of a few open-source projects and pieces of code. We release an open-source Python library implementing the TAIG and interactive-TAIG contributions. Also included are tutorials and examples for using TAIG on an arbitrary robotic system. Finally, we release an open-source library with a new AI gym-compatible environment where the agent controls traffic lights in four-way intersection.


@mastersthesis{Roth-2019-117131,
author = {Aaron Roth},
title = {Structured Representations for Behaviors of Autonomous Robots},
year = {2019},
month = {July},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-50},
keywords = {Artificial Intelligence, Explainable Artificial Intelligence, Human Robot Interaction, Reinforcement Learning, Decision Trees, Robotics Frameworks, Instruction Graph, Learning by Instruction},
} 2019-08-12T11:03:59-04:00