"An Intrinsic Reward Mechanism for Efficient Exploration" by Özgür Şimşek

Selected Works of Andrew G. Barto

Follow Contact

Other

An Intrinsic Reward Mechanism for Efficient Exploration

Computer Science Department Faculty Publication Series

Özgür Şimşek, University of Massachusetts - Amherst
Andrew G. Barto, University of Massachusetts - Amherst

Download

Publication Date

2006

Abstract

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm’s use for learning a policy for a skill given its reward function—an important but neglected component of skill discovery.

Disciplines

Computer Sciences

Comments

This paper was harvested from CiteSeer

Citation Information

Özgür Şimşek and Andrew G. Barto. "An Intrinsic Reward Mechanism for Efficient Exploration" (2006)
Available at: http://works.bepress.com/andrew_barto/13/