Skip to main content
The Actor-Critic Algorithm for Infinite Horizon Discounted Cost Revisited
Proceedings of the 2020 Winter Simulation Conference (2020, Virtual)
  • Abhijit Gosavi, Missouri University of Science and Technology

Reinforcement Learning (RL) is a methodology used to solve Markov decision processes (MDPs) within simulators. In the classical Actor-Critic (AC), a popular RL algorithm, the values of the so-called actor become unbounded. A recently introduced variant of the AC keeps the actor's values naturally bounded. However, the algorithm's convergence properties have not been established mathematically in the literature. Numerically, the bounded AC was studied under the Boltzmann action-selection strategy, but not under the more popular ϵ-greedy strategy in which the probability of selecting any non-greedy action converges to zero in the limit. The paper revisits the AC framework. A short review of the existing literature in the growing field of ACs is first presented. Thereafter, the algorithm is investigated for its convergence properties, under ϵ-greedy action selection, numerically on a small-scale MDP, as well as mathematically via the ordinary differential equation framework.

Meeting Name
Winter Simulation Conference, WSC (2020: Dec. 14-18, Virtual)
Engineering Management and Systems Engineering
Keywords and Phrases
  • Reinforcement learning,
  • Ordinary differential equations,
  • Markov processes,
  • Infinite horizon,
  • Convergence,
  • Testing
International Standard Book Number (ISBN)
Document Type
Article - Conference proceedings
Document Version
File Type
© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
Publication Date
18 Dec 2020
Citation Information
Abhijit Gosavi. "The Actor-Critic Algorithm for Infinite Horizon Discounted Cost Revisited" Proceedings of the 2020 Winter Simulation Conference (2020, Virtual) (2020) p. 2867 - 2878 ISSN: 0891-7736
Available at: