Plan-based Reward Shaping for Reinforcement Learning

Marek Grzes, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reinforcement learning, while being a highly popular learning technique for agents and multi-agent systems, has so far encountered difficulties when applying it to more complex domains due to scaling-up problems. This paper focuses on the use of domain knowledge to improve the convergence speed and optimality of various RL techniques. Specifically, we propose the use of high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We show that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPS-based method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluate the robustness of the proposed STRIPS-based technique to errors in the plan knowledge.

Original languageEnglish
Title of host publication2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2
Place of PublicationNEW YORK
PublisherIEEE
Pages416-423
Number of pages8
Volume3
ISBN (Print)978-1-4244-1739-1
Publication statusPublished - 2008
Event4th International IEEE Conference Intelligent Systems - Varna
Duration: 6 Sep 20088 Sep 2008

Conference

Conference4th International IEEE Conference Intelligent Systems
CityVarna
Period6/09/088/09/08

Keywords

  • Reinforcement learning
  • reward shaping
  • symbolic planning
  • STRIPS

Cite this