Reverse Curriculum Generation for Reinforcement Learning

Carlos Florensa, David Held, Markus Wulfmeier, and Pieter Abbeel

Conference Paper, Proceedings of (CoRL) Conference on Robot Learning, pp. 482 - 495, November, 2017

View Publication

Abstract

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal. Our method automatically generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.=

Notes
Associated Lab: Robots Perceiving and Doing

BibTeX

@conference{Held-2017-102821,
author = {Carlos Florensa and David Held and Markus Wulfmeier and Pieter Abbeel},
title = {Reverse Curriculum Generation for Reinforcement Learning},
booktitle = {Proceedings of (CoRL) Conference on Robot Learning},
year = {2017},
month = {November},
editor = {Sergey Levine and Vincent Vanhoucke and Ken Goldberg},
pages = {482 - 495},
publisher = {Proceedings of Machine Learning Research (PMLR)},
keywords = {reinforcement learning, object manipulation, curriculum},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.