Plan-based reward shaping for multi-agent reinforcement learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent theoretical results have justified the use of potentialbased reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Original languageEnglish
Title of host publicationProceedings of the Adaptive and Learning Agents Workshop 2012, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012
Pages49-56
Number of pages8
Publication statusPublished - 2012
Event2012 Workshop on Adaptive and Learning Agents, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012 - Valencia, Spain
Duration: 4 Jun 20125 Jun 2012

Conference

Conference2012 Workshop on Adaptive and Learning Agents, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012
Country/TerritorySpain
CityValencia
Period4/06/125/06/12

Keywords

  • Reinforcement learning
  • Reward shaping

Cite this