Inverse Optimal Heuristic Control for Imitation Learning

Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey and Siddhartha Srinivasa
Conference Paper, Carnegie Mellon University, Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009

One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

