Exploration with Expert Policy Advice

Ashwin Khadke, Arpit Agarwal, Anahita Mohseni Kabir and Devin Schwab
Tech. Report, August, 2018

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Exploration for Reinforcement Learning is a challenging problem. Random exploration is often highly inefficient and in sparse reward environments may completely fail. In this work, we developed a novel method that incorporates expert advice for exploration in sparse reward environments. In our formulation, the agent has access to a set of expert policies and learns to bias its exploration based on the experts’ suggested actions. By incorporating expert suggestions the agent is able to quickly learn a policy to reach rewarding states. Our method can mix and match experts’ advice during an episode to reach goal states. Moreover, our formulation does not restrict the agent to any policy set. This allows us to aim for a globally optimal solution. In our experiments, we show that using expert advice indeed leads to faster exploration in challenging grid-world environments.

author = {Ashwin Khadke and Arpit Agarwal and Anahita Mohseni Kabir and Devin Schwab},
title = {Exploration with Expert Policy Advice},
year = {2018},
month = {August},
institution = {Carnegie Mellon University},
address = {Pittsburgh, PA},
keywords = {Reinforcement Learning, Exploration, Learning from Expert Advice},
} 2019-10-29T13:29:04-05:00