Skip to main content
Article
Low-Discrepancy Action Selection in Markov Decision Processes
MAA Southeast Section
  • Stephen W. Carden, Georgia Southern University
Document Type
Presentation
Presentation Date
3-1-2018
Disciplines
Abstract or Description

Presentation given at the MAA Southeast Section.

Abstract

In a Markov Decision Process, an agent must learn to choose actions in order to optimally navigate a Markovian environment. When the system dynamics are unknown and the agent's behavior is learned from data, the problem is known as Reinforcement Learning. In theory, for the learned behavior to converge to the optimal behavior, data must be collected from every state-action combination infinitely often. Therefore in practice, the methodology the agent uses to explore the environment is critical to learning approximately optimal behavior from a reasonable amount of data. This paper discusses the benefits of augmenting existing exploration strategies by choosing from actions in a low-discrepancy manner. When the state and action spaces are discrete, actions are selected uniformly from those who have been tried the least number of times. When the state and action spaces are continuous, quasi-random sequences are used to select actions. The superiority of this strategy over purely random action selection is demonstrated by proof for a simple discrete MDP, and empirically for more complex processes

Location
Clemson, SC
Citation Information
Stephen W. Carden. "Low-Discrepancy Action Selection in Markov Decision Processes" MAA Southeast Section (2018)
Available at: http://works.bepress.com/stephen_carden/30/