Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa
Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009.


Download
  • Adobe portable document format (pdf) (2MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

Keywords
imitation learning, apprenticeship learning, inverse optimal control, behavioral cloning, planning, stochastic policies, people prediction, taxi route prediction

Notes
Associated Project(s): PeepPredict

Text Reference
Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa, "Inverse Optimal Heuristic Control for Imitation Learning," Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009.

BibTeX Reference
@inproceedings{Ratliff_2009_6284,
   author = "Nathan Ratliff and Brian D. Ziebart and Kevin Peterson and J. Andrew (Drew) Bagnell and Martial Hebert and Anind Dey and Siddhartha Srinivasa",
   title = "Inverse Optimal Heuristic Control for Imitation Learning",
   booktitle = "Twelfth International Conference on Artificial Intelligence and Statistics (AIStats)",
   month = "April",
   year = "2009",
}