The Robotics Institute
Search the site
RI | Publications | Timing and partial observability in the dopamine system

Text only version of this site

Timing and partial observability in the dopamine system
N. Daw, A. Courville, and D.S. Touretzky
Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, ed., MIT Press, Cambridge, MA, 2003.

Jump to: Download | Abstract | Notes | Text Reference | BibTeX Reference

Download [Help]

Adobe portable document format (pdf) [105 KB]

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

Notes

Note: in press

Text Reference

N. Daw, A. Courville, and D.S. Touretzky, "Timing and partial observability in the dopamine system," Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer, ed., MIT Press, Cambridge, MA, 2003.

BibTeX Reference

@incollection{Daw_2003_4309,
   author = "Nathaniel Daw and Aaron Courville and David S Touretzky",
   editor = "S. Becker, S. Thrun, and K. Obermayer",
   title = "Timing and partial observability in the dopamine system",
   booktitle = "Advances in Neural Information Processing Systems 15",
   publisher = "MIT Press",
   address = "Cambridge, MA",
   year = "2003",
   note = "in press"
}


The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.
For updates and comments, please see these instructions.
This page maintained by robotwebmaster@ri.cmu.edu