Skip to main content
Contribution to Book
Solving Markov Decision Processes via Simulation
International Series in Operations Research and Management Science: Handbook of Simulation Optimization
  • Abhijit Gosavi, Missouri University of Science and Technology
Abstract

This chapter presents an overview of simulation-based techniques useful for solving Markov decision processes (MDPs). MDPs model problems of sequential decision-making under uncertainty, in which decisions made in each state collectively affect the trajectory of the states visited by the system over a time horizon of interest. Traditionally, MDPs have been solved via dynamic programming (DP), which requires the transition probability model that is difficult to derive in many realistic settings. The use of simulation for solving MDPs allows us to bypass the transition probability model and solve large-scale MDPs considered intractable to solve by traditional DP. The simulation-based methodology for solving MDPs, which like DP is also rooted in the Bellman equations, goes by names such as reinforcement learning, neuro-DP, and approximate or adaptive DP.We begin with a description of algorithms for infinite-horizon discounted reward MDPs, followed by the same for infinite-horizon average reward MDPs. Then we present a discussion on finite-horizon MDPs. For each problem considered, we present a step-by-step description of a selected group of algorithms. In making this selection, we have attempted to blend the old and the classical with more recent developments. Finally, after touching upon extensions and convergence theory, we conclude with a brief summary of some applications and directions for future research.

Department(s)
Engineering Management and Systems Engineering
Document Type
Book - Chapter
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2015 Springer Verlag, All rights reserved.
Publication Date
1-1-2015
Publication Date
01 Jan 2015
Citation Information
Abhijit Gosavi. "Solving Markov Decision Processes via Simulation" International Series in Operations Research and Management Science: Handbook of Simulation Optimization Vol. 216 (2015) p. 341 - 379 ISSN: 0884-8289; 2214-7934
Available at: http://works.bepress.com/abhijit-gosavi/32/