By the same authors

Dynamic Potential-Based Reward Shaping

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Standard

Dynamic Potential-Based Reward Shaping. / Devlin, Sam Michael; Kudenko, Daniel.

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, 2012. p. 433-440.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Devlin, SM & Kudenko, D 2012, Dynamic Potential-Based Reward Shaping. in Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, pp. 433-440, 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Valencia, Spain, 4/06/12. <http://www.ifaamas.org/Proceedings/aamas2012/papers/2C_3.pdf>

APA

Devlin, S. M., & Kudenko, D. (2012). Dynamic Potential-Based Reward Shaping. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (pp. 433-440). IFAAMAS. http://www.ifaamas.org/Proceedings/aamas2012/papers/2C_3.pdf

Vancouver

Devlin SM, Kudenko D. Dynamic Potential-Based Reward Shaping. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS. 2012. p. 433-440

Author

Devlin, Sam Michael ; Kudenko, Daniel. / Dynamic Potential-Based Reward Shaping. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, 2012. pp. 433-440

Bibtex - Download

@inproceedings{4f42f4535d9340f7a9181eae9b52e9ca,
title = "Dynamic Potential-Based Reward Shaping",
abstract = "Potential-based reward shaping can signicantly improvethe time needed to learn an optimal policy and, in multi-agent systems, the performance of the nal joint-policy. Ithas been proven to not alter the optimal policy of an agentlearning alone or the Nash equilibria of multiple agents learn-ing together.However, a limitation of existing proofs is the assumptionthat the potential of a state does not change dynamicallyduring the learning. This assumption often is broken, espe-cially if the reward-shaping function is generated automati-cally.In this paper we prove and demonstrate a method of ex-tending potential-based reward shaping to allow dynamicshaping and maintain the guarantees of policy invariance inthe single-agent case and consistent Nash equilibria in themulti-agent case.",
author = "Devlin, {Sam Michael} and Daniel Kudenko",
year = "2012",
month = jun,
language = "English",
isbn = "978-0-9817381-3-0",
pages = "433--440",
booktitle = "Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems",
publisher = "IFAAMAS",
note = "11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012) ; Conference date: 04-06-2012 Through 08-06-2012",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Dynamic Potential-Based Reward Shaping

AU - Devlin, Sam Michael

AU - Kudenko, Daniel

PY - 2012/6

Y1 - 2012/6

N2 - Potential-based reward shaping can signicantly improvethe time needed to learn an optimal policy and, in multi-agent systems, the performance of the nal joint-policy. Ithas been proven to not alter the optimal policy of an agentlearning alone or the Nash equilibria of multiple agents learn-ing together.However, a limitation of existing proofs is the assumptionthat the potential of a state does not change dynamicallyduring the learning. This assumption often is broken, espe-cially if the reward-shaping function is generated automati-cally.In this paper we prove and demonstrate a method of ex-tending potential-based reward shaping to allow dynamicshaping and maintain the guarantees of policy invariance inthe single-agent case and consistent Nash equilibria in themulti-agent case.

AB - Potential-based reward shaping can signicantly improvethe time needed to learn an optimal policy and, in multi-agent systems, the performance of the nal joint-policy. Ithas been proven to not alter the optimal policy of an agentlearning alone or the Nash equilibria of multiple agents learn-ing together.However, a limitation of existing proofs is the assumptionthat the potential of a state does not change dynamicallyduring the learning. This assumption often is broken, espe-cially if the reward-shaping function is generated automati-cally.In this paper we prove and demonstrate a method of ex-tending potential-based reward shaping to allow dynamicshaping and maintain the guarantees of policy invariance inthe single-agent case and consistent Nash equilibria in themulti-agent case.

M3 - Conference contribution

SN - 978-0-9817381-3-0

SP - 433

EP - 440

BT - Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems

PB - IFAAMAS

T2 - 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012)

Y2 - 4 June 2012 through 8 June 2012

ER -