Plan-based reward shaping for multi-agent reinforcement learning

Research output: Contribution to journalArticlepeer-review

Abstract

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Original languageEnglish
Pages (from-to)44-58
Number of pages15
JournalThe Knowledge Engineering Review
Volume31
Issue number1
DOIs
Publication statusPublished - 11 Feb 2016

Cite this