Skip to main content
Other
An Intrinsic Reward Mechanism for Efficient Exploration
Computer Science Department Faculty Publication Series
  • Özgür Şimşek, University of Massachusetts - Amherst
  • Andrew G. Barto, University of Massachusetts - Amherst
Publication Date
2006
Abstract
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm’s use for learning a policy for a skill given its reward function—an important but neglected component of skill discovery.
Disciplines
Comments
This paper was harvested from CiteSeer
Citation Information
Özgür Şimşek and Andrew G. Barto. "An Intrinsic Reward Mechanism for Efficient Exploration" (2006)
Available at: http://works.bepress.com/andrew_barto/13/