Dynamic Potential-Based Reward Shaping

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Potential-based reward shaping can signicantly improve
the time needed to learn an optimal policy and, in multi-
agent systems, the performance of the nal joint-policy. It
has been proven to not alter the optimal policy of an agent
learning alone or the Nash equilibria of multiple agents learn-
ing together.
However, a limitation of existing proofs is the assumption
that the potential of a state does not change dynamically
during the learning. This assumption often is broken, espe-
cially if the reward-shaping function is generated automati-
cally.
In this paper we prove and demonstrate a method of ex-
tending potential-based reward shaping to allow dynamic
shaping and maintain the guarantees of policy invariance in
the single-agent case and consistent Nash equilibria in the
multi-agent case.
Original languageEnglish
Title of host publicationProceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems
PublisherIFAAMAS
Pages433-440
Number of pages8
ISBN (Print)978-0-9817381-3-0
Publication statusPublished - Jun 2012
Event11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012) - Valencia, Spain
Duration: 4 Jun 20128 Jun 2012

Conference

Conference11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012)
Country/TerritorySpain
CityValencia
Period4/06/128/06/12

Cite this