Multigrid Reinforcement Learning with Reward Shaping

Marek Grzes, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper we propose a way to solve this problem in reinforcement learning with state space discretion. In particular, we show that the potential function can be learned online in parallel with the actual reinforcement learning process. If the Q-function is learned for states determined by a given grid, a V-functional for states with lower resolution can be learned in parallel and used to approximate the potential for ground learning. The novel algorithm is presented and experimentally evaluated.

Original languageEnglish
Title of host publicationARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I
EditorsV Kurkova, R Neruda, J Koutnik
Place of PublicationBERLIN
PublisherSpringer
Pages357-366
Number of pages10
Volume5163 LNCS
EditionPART 1
ISBN (Print)978-3-540-87535-2
Publication statusPublished - 2008
Event18th International Conference on Arificial Neural Networks (ICANN 2008) - Prague
Duration: 3 Sept 20086 Sept 2008

Conference

Conference18th International Conference on Arificial Neural Networks (ICANN 2008)
CityPrague
Period3/09/086/09/08

Cite this