This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna-HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna-HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna-HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna-Q, under the same benchmark conditions. We demonstrate that Dyna-HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna-HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.
- Heuristic Programming,
- Learning Algorithms,
- Markov Processes,
- Mobile Robots,
- Motion Planning,
- Reinforcement Learning,
- Robot Programming,
- Robots,
- Direct Heuristic Dynamic Programming,
- Dyna,
- Heuristic Dynamic Programming,
- Mobile Robotic,
- Multi-Robot Path Planning,
- Q-Learning,
- Sarsa,
- Traditional Reinforcements,
- Dynamic Programming
Available at: http://works.bepress.com/donald-wunsch/25/