Risk-Variant Policy Switching to Exceed Reward Thresholds

Breelyn Melissa Kane Styler and Reid Simmons

Conference Paper, Proceedings of 22nd International Conference on Automated Planning and Scheduling (ICAPS '12), pp. 83 - 91, June, 2012

View Publication

Abstract

This paper presents a decision-theoretic planning approach for probabilistic environments where the agent’s goal is to win, which we model as maximizing the probability of being above a given reward threshold. In competitive domains, second is as good as last, and it is often desirable to take risks if one is in danger of losing, even if the risk does not pay off very often. Our algorithm maximizes the probability of being above a particular reward threshold by dynamically switching between a suite of policies, each of which encodes a different level of risk. This method does not explicitly encode time or reward into the state space, and decides when to switch between policies during each execution step. We compare a risk-neutral policy to switching among different risk-sensitive policies, and show that our approach improves the agent’s probability of winning.

BibTeX

@conference{Styler-2012-7522,
author = {Breelyn Melissa Kane Styler and Reid Simmons},
title = {Risk-Variant Policy Switching to Exceed Reward Thresholds},
booktitle = {Proceedings of 22nd International Conference on Automated Planning and Scheduling (ICAPS '12)},
year = {2012},
month = {June},
pages = {83 - 91},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.