On Maximizing the Average Time at a Goal
NOTE: At the time of publication, the author Theodore P. Hill was not yet affiliated with Cal Poly.
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.
S. Demko and Theodore P. Hill. "On Maximizing the Average Time at a Goal" Stochastic Processes and Their Applications 17.2 (1984): 349-357.
Available at: http://works.bepress.com/tphill/12