Abstract
We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.
Original language | English |
---|---|
Title of host publication | 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013 |
Publisher | International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) |
Pages | 1123-1124 |
Number of pages | 2 |
Volume | 2 |
Publication status | Published - 2013 |
Event | 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013 - Saint Paul, MN, United States Duration: 6 May 2013 → 10 May 2013 |
Conference
Conference | 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013 |
---|---|
Country/Territory | United States |
City | Saint Paul, MN |
Period | 6/05/13 → 10/05/13 |
Keywords
- Online Planning
- POMDP
- Potential-Based Reward Shaping