Home/Inverse Optimal Heuristic Control for Imitation Learning

Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey and Siddhartha Srinivasa
Conference Paper, Carnegie Mellon University, Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

BibTeX Reference
title = {Inverse Optimal Heuristic Control for Imitation Learning},
author = {Nathan Ratliff and Brian D. Ziebart and Kevin Peterson and J. Andrew (Drew) Bagnell and Martial Hebert and Anind Dey and Siddhartha Srinivasa},
booktitle = {Twelfth International Conference on Artificial Intelligence and Statistics (AIStats)},
keyword = {imitation learning, apprenticeship learning, inverse optimal control, behavioral cloning, planning, stochastic policies, people prediction, taxi route prediction},
school = {Robotics Institute , Carnegie Mellon University},
month = {April},
year = {2009},
address = {Pittsburgh, PA},