/Multi-task Value of Information Planning for Sequential Multi-task Bandits

Multi-task Value of Information Planning for Sequential Multi-task Bandits

Rika Antonova
Tech. Report, CMU-RI-TR-16-41, Robotics Institute, Carnegie Mellon University, August, 2016

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In sequential decision making under uncertainty, algorithms and agents that learn across related tasks can have significantly better performance than approaches that neglect to leverage related experience. In this work we consider online learning across a sequence of tasks, where each task is drawn from a finite set of multi-armed bandits (MABs). We introduce the Multi-task Value of Information (MT-VOI) planner, which balances exploration and exploitation at a task level by evaluating the benefits of additional exploration in the current task in order to improve reward across tasks. Our approach demonstrates a substantial improvement over single-task algorithms and a recent multi-task algorithm designed specifically for acting across a sequence of MABs.

BibTeX Reference
@techreport{Antonova-2016-5578,
author = {Rika Antonova},
title = {Multi-task Value of Information Planning for Sequential Multi-task Bandits},
year = {2016},
month = {August},
institution = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-16-41},
}
2017-09-13T10:38:18+00:00