Robust Value Function Approximation by Working Backwards

Justin Boyan and Andrew Moore
Proceedings of the Workshop on Value Function Approximation, Machine Learning Conference, July, 1995.


Download
  • Adobe portable document format (pdf) (213KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
In this paper, we examine the intuition that TD() is meant to operate by approximating asynchronous value iteration. We note that on the important class of discrete acyclic stochastic tasks, value iteration is inefficient compared with the DAG-SP algorithm, which essentially performs only one sweep instead of many by working backwards from the goal. The question we address in this paper is whether there is an analogous algorithm that can be used in large stochastic state spaces requiring function approximation. We present such an algorithm, analyze it, and give comparative results to TD on several domains.

Notes

Text Reference
Justin Boyan and Andrew Moore, "Robust Value Function Approximation by Working Backwards," Proceedings of the Workshop on Value Function Approximation, Machine Learning Conference, July, 1995.

BibTeX Reference
@inproceedings{Boyan_1995_2821,
   author = "Justin Boyan and Andrew Moore",
   title = "Robust Value Function Approximation by Working Backwards",
   booktitle = "Proceedings of the Workshop on Value Function Approximation, Machine Learning Conference",
   month = "July",
   year = "1995",
}