On Maximizing the Average Time at a Goal
Article comments
Copyright © 1984 Elsevier. The definitive version is available at http://dx.doi.org/10.1016/0304-4149(84)90010-3 .
NOTE: At the time of publication, the author Theodore P. Hill was not yet affiliated with Cal Poly.
Abstract
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.
Suggested Citation
S. Demko and Theodore P. Hill. "On Maximizing the Average Time at a Goal" Stochastic Processes and Their Applications 17.2 (1984): 349-357.
Available at: http://works.bepress.com/tphill/12