Policy Transfer via Modularity and Reward Guiding

Ignasi Clavera, David Held and Pieter Abbeel
Conference Paper, International Conference on Intelligent Robots and Systems (IROS), September, 2017

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Non-prehensile manipulation, such as pushing, is an important function for robots to move objects and is sometimes preferred as an alternative to grasping. However, due to unknown frictional forces, pushing has been proven a difficult task for robots. We explore the use of reinforcement learning to train a robot to robustly push an object. In order to deal with the sample complexity of training such a method, we train the pushing policy in simulation and then transfer this policy to the real world. In order to ease the transfer from simulation, we propose to use modularity to separate the learned policy from the raw inputs and outputs; rather than training “end-to-end,” we decompose our system into modules and train only a subset of these modules in simulation. We further demonstrate that we can incorporate prior knowledge about the task into the state space and the reward function to speed up convergence. Finally, we introduce ”reward guiding” to modify the reward function and further reduce the training time. We demonstrate, in both simulation and real-world experiments, that such an approach can be used to reliably push an object from many initial positions and orientations.

Assocaited Lab - Robots Perceiving and Doing

author = {Ignasi Clavera and David Held and Pieter Abbeel},
title = {Policy Transfer via Modularity and Reward Guiding},
booktitle = {International Conference on Intelligent Robots and Systems (IROS)},
year = {2017},
month = {September},
keywords = {reinforcement learning, object manipulation, transfer},
} 2018-02-07T13:47:47-04:00