Carnegie Mellon University
Learning to search: Functional gradient techniques for imitation learning

Nathan Ratliff, David Silver, and J. Andrew (Drew) Bagnell
Autonomous Robots, Vol. 27, No. 1, pp. 25-53, July, 2009.

  • Adobe portable document format (pdf) (6MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.

Imitation learning,Structured prediction,Subgradient methods,Nonparametric optimization,Functional gradient techniques,Robotics,Planning,Autonomous navigation,Quadrupedal locomotion,Grasping

Associated Center(s) / Consortia: Quality of Life Technology Center, National Robotics Engineering Center, and Center for the Foundations of Robotics
Associated Lab(s) / Group(s): Planning and Autonomy Lab and Personal Robotics
Associated Project(s): Learning Locomotion and UGCV PerceptOR Integrated

Text Reference
Nathan Ratliff, David Silver, and J. Andrew (Drew) Bagnell, "Learning to search: Functional gradient techniques for imitation learning," Autonomous Robots, Vol. 27, No. 1, pp. 25-53, July, 2009.

BibTeX Reference
   author = "Nathan Ratliff and David Silver and J. Andrew (Drew) Bagnell",
   editor = "Jan Peters and Andrew Y. Ng ",
   title = "Learning to search: Functional gradient techniques for imitation learning",
   journal = "Autonomous Robots",
   pages = "25-53",
   publisher = "Springer",
   month = "July",
   year = "2009",
   volume = "27",
   number = "1",