Skip to main content
Article
Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse Reinforcement Learning
Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2014, Orlando, FL)
  • Abhijit Gosavi, Missouri University of Science and Technology
  • Sajal K. Das, Missouri University of Science and Technology
  • Susan L. Murray, Missouri University of Science and Technology
Abstract

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

Meeting Name
IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2014: Dec. 9-12; Orlando, FL)
Department(s)
Engineering Management and Systems Engineering
Second Department
Computer Science
Third Department
Psychological Science
Keywords and Phrases
  • Algorithms,
  • Behavioral research,
  • Computation theory,
  • Decision making,
  • Decision theory,
  • Dynamic programming,
  • Markov processes,
  • Risk analysis,
  • Risks,
  • Computational deficiency,
  • Empirical studies,
  • Exponential utility,
  • Exponential utility function,
  • Markov Decision Processes,
  • Mathematical proof,
  • Objective functions,
  • Reasonable value,
  • Reinforcement learning
International Standard Book Number (ISBN)
9781479945535
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2014 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
1-1-2014
Citation Information
Abhijit Gosavi, Sajal K. Das and Susan L. Murray. "Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse Reinforcement Learning" Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2014, Orlando, FL) (2014) ISSN: 2325-1824
Available at: http://works.bepress.com/sajal-das/1/