Representation Reuse for Learning Robust Robot Manipulations

PhD Thesis, Tech. Report, CMU-RI-TR-24-03, February, 2024

View Publication

Abstract

Real world robots need to continuously learn new manipulation tasks. These new manipulation tasks often share many sub-structures with previously learned tasks, e.g., sub-tasks, controllers, preconditions. In this thesis, we aim to utilize these shared sub-structures to efficiently learn new manipulation tasks. For this, we explore reusing skill representations. These skill representations are either provided manually as structured policy representations or learned in a data-driven manner.

The first part of this thesis focuses on policy representations. To learn compositional skill policies we propose object-centric task-axes controllers. Our task-axes controllers learn the skill structure and are composed into specialized policy representations for individual tasks. These representations utilize the compositional, object-centric and geometric structure underlying many manipulation tasks. As we show through extensive experiments, these representations are robust to environment variations and are learned from limited data. We also show how parameterized policy representations help learn new tasks efficiently in a lifelong learning manner. To achieve this, we propose skill effect models, which predict the effects of stereotypical skill executions. We utilize skill effect models together with the power of search-based planning to effectively plan for new tasks and learn new skills over time.

The second part of this thesis focuses on visual representations. These visual representations, learned either from simulation or offline web data are used for efficient learning of skill preconditions and policies respectively. Specifically, for skill preconditions we focus on compositional learning and show how complex manipulation tasks, with multiple objects, can be simplified by focusing on pairwise object relations. These relational representations are learned offline using large scale simulation data. In the latter part, we focus on skill policies that utilize large pretrained visual representations for robot manipulation. First, we propose RoboAdapters, which uses neural adapters as an alternative to frozen or fully-finetuned visual representations for robot manipulation. RoboAdapters bridge the performance gap between frozen representations and full fine-tuning while preserving the original capabilities of the pretrained model. Finally, we explore using large pretrained vision-language representations for real-time control of precise and dynamic manipulation tasks. We use multiple sensing modalities at different hierarchies to enable real-time control while maintaining the generalization and robustness of pretrained representations.

BibTeX

@phdthesis{Sharma-2024-139646,
author = {Mohit Sharma},
title = {Representation Reuse for Learning Robust Robot Manipulations},
year = {2024},
month = {February},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-03},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.