Watch, Predict, Act: Robot Learning meets Web Videos - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

May

8
Thu
Homanga Bharadhwaj PhD Student Robotics Institute,
Carnegie Mellon University
Thursday, May 8
4:00 pm to 5:30 pm
NSH 3305
Watch, Predict, Act: Robot Learning meets Web Videos

Abstract:
To enable robots to assist in everyday tasks in diverse natural environments such as homes, offices, and kitchens, it is critical to develop policies that generalize to novel tasks in unseen scenarios. Practical utility demands that these policies do not require task-specific adaptation at test time but can instead execute directly given a natural task specification, such as a language instruction. Moreover, such policies should be able to handle a broad spectrum of tasks—such as manipulating articulated objects, pouring, reorienting objects, and wiping tables — without the need for explicit robot data collection for every possible task, as required by the predominant paradigm of end-to-end imitation learning. The difficulty in collecting large-scale, diverse robot interaction datasets in natural scenarios makes this requirement impractical.

While typical approaches rely on a large amount of demonstration data for such generalization, in this thesis we present approaches for effectively leveraging web video data to scalably augment robot interaction datasets. I will demonstrate the paradigm of conditioning robotic policies explicitly on motion cues from predictive models trained on large-scale video datasets, enabling the policy to perform new tasks with novel objects and novel motions unseen in the robot-specific data. A key insight in the talk is factorizing a robotic policy into an embodiment-agnostic interaction plan that can now use general internet data and embodiment-specific action execution conditioned on the plan, which is substantially easier of a problem. I will show how this overall paradigm of predictive planning from web videos enables training common goal/language-conditioned policies that can perform multiple tasks without relying on task-specific or scene-specific heuristics. 

Thesis Committee Members:
Abhinav Gupta (Co-Chair)
Shubham Tulsiani (Co-Chair)
Oliver Kroemer
Sergey Levine (UC Berkeley)

More Information