/Contrasting Exploration in Parameter and Action Space: A Zeroth Order Optimization Perspective

Contrasting Exploration in Parameter and Action Space: A Zeroth Order Optimization Perspective

Anirudh Vemula, Wen Sun and J. Andrew Bagnell
Conference Paper, Proceedings of The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), March, 2019

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical analyses, we prove that complexity of exploration in parameter space depends on the dimensionality of parameter space, while complexity of exploration in action space depends on both the dimensionality of action space and horizon length. This is also demonstrated empirically by comparing simple exploration methods on several model problems, including Contextual Bandit, Linear Regression and Reinforcement Learning in continuous control.

BibTeX Reference
@conference{Vemula-2019-112058,
author = {Anirudh Vemula and Wen Sun and J. Andrew Bagnell},
title = {Contrasting Exploration in Parameter and Action Space: A Zeroth Order Optimization Perspective},
booktitle = {Proceedings of The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2019},
month = {March},
keywords = {Reinforcement Learning; Exploration; Optimization},
}
2019-03-04T15:38:54-04:00