"Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior" by Stephen W. Carden

Selected Works of Stephen W. Carden

Follow Contact

Presentation

Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior

100th Meeting of the Southeastern Section of the Mathematical Association of America

Stephen W. Carden, Georgia Southern University

Document Type

Presentation

Presentation Date

3-1-2021

Disciplines

Mathematics

Abstract or Description

Presentation given at the 100th Meeting of the Southeastern Section of the Mathematical Association of America.

Abstract

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority

Location

Virtual

Source

https://maasoutheastern.org/2021-conference/

Citation Information

Stephen W. Carden. "Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior" 100th Meeting of the Southeastern Section of the Mathematical Association of America (2021)
Available at: http://works.bepress.com/stephen_carden/29/