Skip to main content
Presentation
Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior
100th Meeting of the Southeastern Section of the Mathematical Association of America
  • Stephen W. Carden, Georgia Southern University
Document Type
Presentation
Presentation Date
3-1-2021
Disciplines
Abstract or Description

Presentation given at the 100th Meeting of the Southeastern Section of the Mathematical Association of America.

Abstract

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority

Location
Virtual
Source
https://maasoutheastern.org/2021-conference/
Citation Information
Stephen W. Carden. "Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior" 100th Meeting of the Southeastern Section of the Mathematical Association of America (2021)
Available at: http://works.bepress.com/stephen_carden/29/