Dopamine and inference about timing - Robotics Institute Carnegie Mellon University

Dopamine and inference about timing

Nathaniel Daw, Aaron Courville, and David S. Touretzky
Conference Paper, Proceedings of 2nd International Conference on Development and Learning (ICDL '02), pp. 271 - 276, June, 2002

Abstract

Temporal-difference learning (TD) models explain most responses of primate dopamine neurons in appetitive conditioning. But because existing models are based in the simple formal setting of Markov processes, they do not provide a realistic account of the partial observability of the state of the world, nor of variation in event timing. For instance, the TD model of Montague et al. (1996) mispredicts the dopamine response when an expected reward is delivered early. We explain such experimental results using a version of TD learning grounded in the richer formalism of partially observable semi-Markov processes. We propose that the brain infers the likely state of the world from limited observations, using a statistical model of how the world's state evolves. Inference is necessary for such judgements as whether an expected reward is merely late, versus having been omitted altogether. The dopamine signal is modeled as a TD error signal for learning to predict future rewards from this inferred state representation.

BibTeX

@conference{Daw-2002-16848,
author = {Nathaniel Daw and Aaron Courville and David S. Touretzky},
title = {Dopamine and inference about timing},
booktitle = {Proceedings of 2nd International Conference on Development and Learning (ICDL '02)},
year = {2002},
month = {June},
pages = {271 - 276},
}