Abstract
Recent theoretical results have justified the use of potentialbased reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.
Original language | English |
---|---|
Title of host publication | Proceedings of the Adaptive and Learning Agents Workshop 2012, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012 |
Pages | 49-56 |
Number of pages | 8 |
Publication status | Published - 2012 |
Event | 2012 Workshop on Adaptive and Learning Agents, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012 - Valencia, Spain Duration: 4 Jun 2012 → 5 Jun 2012 |
Conference
Conference | 2012 Workshop on Adaptive and Learning Agents, ALA 2012 - Held in Conjunction with the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012 |
---|---|
Country/Territory | Spain |
City | Valencia |
Period | 4/06/12 → 5/06/12 |
Keywords
- Reinforcement learning
- Reward shaping