"Low-Discrepancy Action Selection in Markov Decision Processes" by Stephen W. Carden

Selected Works of Stephen W. Carden

Follow Contact

Article

Low-Discrepancy Action Selection in Markov Decision Processes

MAA Southeast Section

Stephen W. Carden, Georgia Southern University

Document Type

Presentation

Presentation Date

3-1-2018

Disciplines

Mathematics

Abstract or Description

Presentation given at the MAA Southeast Section.

Abstract

In a Markov Decision Process, an agent must learn to choose actions in order to optimally navigate a Markovian environment. When the system dynamics are unknown and the agent's behavior is learned from data, the problem is known as Reinforcement Learning. In theory, for the learned behavior to converge to the optimal behavior, data must be collected from every state-action combination infinitely often. Therefore in practice, the methodology the agent uses to explore the environment is critical to learning approximately optimal behavior from a reasonable amount of data. This paper discusses the benefits of augmenting existing exploration strategies by choosing from actions in a low-discrepancy manner. When the state and action spaces are discrete, actions are selected uniformly from those who have been tried the least number of times. When the state and action spaces are continuous, quasi-random sequences are used to select actions. The superiority of this strategy over purely random action selection is demonstrated by proof for a simple discrete MDP, and empirically for more complex processes

Location

Clemson, SC

Citation Information

Stephen W. Carden. "Low-Discrepancy Action Selection in Markov Decision Processes" MAA Southeast Section (2018)
Available at: http://works.bepress.com/stephen_carden/30/