Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

Andrew Moore, Leemon Baird, and Leslie Pack Kaelbling

Conference Paper, Proceedings of 16th International Joint Conference on Artificial Intelligence (IJCAI '99), Vol. 2, pp. 1316 - 1321, July, 1999

View Publication

Abstract

If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy π(x) mapping states to actions, and a value function J(x) mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multipolicy π*(x, y) (for all state-pairs (x, y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J*(x, y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsimonious yet powerful abstractactions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.

BibTeX

@conference{Moore-1999-16665,
author = {Andrew Moore and Leemon Baird and Leslie Pack Kaelbling},
title = {Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs},
booktitle = {Proceedings of 16th International Joint Conference on Artificial Intelligence (IJCAI '99)},
year = {1999},
month = {July},
volume = {2},
pages = {1316 - 1321},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.