Potential-based reward shaping for POMDPs

Adam Eck, Leen Kiat Son, Sam Devlin, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.

Original languageEnglish
Title of host publication12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages1123-1124
Number of pages2
Volume2
Publication statusPublished - 2013
Event12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013 - Saint Paul, MN, United States
Duration: 6 May 201310 May 2013

Conference

Conference12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013
Country/TerritoryUnited States
CitySaint Paul, MN
Period6/05/1310/05/13

Keywords

  • Online Planning
  • POMDP
  • Potential-Based Reward Shaping

Cite this