Loading Events

PhD Thesis Defense

January

18
Thu
Mohit Sharma PhD Student Robotics Institute,
Carnegie Mellon University
Thursday, January 18
12:00 pm to 2:00 pm
GHC 6501
Continual Learning of Compositional Skills for Robust Robot Manipulation

Abstract:
Real world robots need to continuously learn new manipulation tasks in a lifelong learning manner. These new tasks often share many sub-structures e.g. sub-tasks, controllers, preconditions, with previously learned tasks. To utilize these shared sub-structures, we explore a compositional and object-centric approach to learn manipulation tasks.

The first part of this thesis focuses on compositional skill learning. Specifically, we focus on compositional learning for preconditions and policies. For skill preconditions, we show how complex manipulation tasks, with multiple objects, can be simplified by focusing on pairwise object relations. To learn compositional skill policies we propose object-centric task-axes controllers. Our task-axes controllers learn the skill structure and are composed into specialized policy representations for individual tasks. These representations are robust to environment variations and are learned from limited data.

The second part of this thesis focuses on lifelong learning. Among the many incantations of lifelong learning, we focus on efficiently reusing previous knowledge (e.g. representations, skills) to learn new tasks. To achieve this, we first propose skill effect models, which predict the effects of stereotypical skill executions. We utilize skill effect models together with the power of search-based planning to effectively plan for new tasks and learn new skills over time. In the latter part we focus on large pretrained visual representations for robot manipulation. First, we propose RoboAdapters, which uses neural adapters as an alternative to frozen or fully-finetuned visual representations for robot manipulation. RoboAdapters bridge the performance gap between frozen representations and full fine-tuning while preserving the original capabilities of the pretrained model. Finally, we explore using large pretrained vision-language representations for real-time control of precise and dynamic manipulation tasks. We use multiple sensing modalities at different hierarchies to enable real-time control while maintaining the generalization and robustness of pretrained representations.

Thesis Committee Members:
Oliver Kroemer, Chair
Abhinav Gupta
David Held
Dieter Fox, University of Washington

More Information