Combining reinforcement learning with symbolic planning

Matthew Grounds, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the major difficulties in applying Q-learning to real-world domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of high-level reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.

Original languageEnglish
Title of host publicationADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS
EditorsK Tuyls, A Nowe, Z Guessoum, D Kudenko
Place of PublicationBERLIN
PublisherSpringer
Pages75-86
Number of pages12
Volume4865 LNAI
ISBN (Print)978-3-540-77947-6
Publication statusPublished - 2008

Cite this