Loading Events

MSR Speaking Qualifier

July

15
Mon
Aaron Roth Robotics Institute,
Carnegie Mellon University
Monday, July 15
10:30 am to 12:00 pm
NSH 4305
Aaron Roth – MSR Thesis Talk

Title: Structured Representations for Behaviors of Autonomous Robots

 

Abstract:

Autonomous robot behavior can be captured in many ways: as code, as modules of code, in an unstructured form such as a neural net, or in one of several more structured formats such as a graph, table, or tree. This talk explores structured representations that are simultaneously understandable by humans and executable by robots.  Enforcing a certain structure on policies can streamline development of code, enable task transfer, facilitate task instruction through other modalities such as interactive dialogue, and cause autonomously learned policies to be interpretable.

 

We present Transferable Augmented Instruction Graphs (TAIGs), a platform-independent task representation and execution framework based on the functional composition of robot behavioral and perceptual primitives. We provide an overview of the previously introduced Instruction Graphs and contribute the Augmented Instruction Graphs with the ability to use memory and represent negated conditions, halt conditions, and nested graphs in order to capture complex task policies. We further define the representation and execution management to reference a library of primitives to allow policies to be transferred between different robot platforms. Recognizing the value of having a means of constructing a graph aside from programming, we introduce Interactive-TAIG, a framework for enabling construction of TAIGs through an interactive dialogue.

 

We discuss two types of structured representations for policies learned autonomously via reinforcement learning. The first is a decision tree structure, where we extend the partial-Conservative Q-Improvement (pCQI) method into two successive methods, Conservative Q-Improvement and Conservative Q Improvement 2.  In contrast to many existing methods for creating decision tree policies via reinforcement learning, which focus on accurately representing an action-value function during training, our extension of the pCQI algorithm only increases tree size when the estimated discounted future reward of the overall policy would increase by a sufficient amount. Through evaluation in simulated environment, we show that its performance is comparable or superior to non CQI-based methods. Additionally, we discuss tuning parameters to control the tradeoff between optimizing for smaller tree size or for overall reward. Secondly, we introduce a method for learning a TAIG using reinforcement!

learning.

 

Committee:

Manuela Veloso, Advisor

Aaron Steinfeld

Nicholay Topin