Skip to main content
Article
Convergence-Based Exploration Algorithm for Reinforcement Learning
Electrical and Computer Engineering Technical Reports and White Papers
  • Ala'Eddin Masadeh, Iowa State University
  • Zhengdao Wang, Iowa State University
  • Ahmed E. Kamal, Iowa State University
Document Type
Report
Publication Date
1-1-2018
Abstract

Reinforcement learning (RL) can be defined as a technique for learning in an unknown environment. Through learning, two main modes select actions, exploration and exploitation. The exploration is to investigate unexplored actions. The exploitation is to exploit current best actions. Balancing between exploration and exploitation is a challenge for RL. In this work, an exploration algorithm for RL is designed. This algorithm introduces two parameters for balancing purpose, which are the action-value function convergence error, and the exploration time threshold. The first parameter evaluates actions and selects the best ones based on the convergent values of their action-value functions. The exploration time threshold forces the agent to exploit the current best policy in the case of inability to explore available actions after a time. We show that this algorithm outperforms the well-known algorithm, which is the epsilon-greedy algorithm. We then study the effects of the introduced parameters on the performance.

Language
en
File Format
application/pdf
Citation Information
Ala'Eddin Masadeh, Zhengdao Wang and Ahmed E. Kamal. "Convergence-Based Exploration Algorithm for Reinforcement Learning" (2018)
Available at: http://works.bepress.com/zhengdao_wang/11/