Abstract
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.
Original language | English |
---|---|
Pages (from-to) | 44-58 |
Number of pages | 15 |
Journal | The Knowledge Engineering Review |
Volume | 31 |
Issue number | 1 |
DOIs | |
Publication status | Published - 11 Feb 2016 |