Knowledge revision for reinforcement learning with abstract MDPs

Kyriakos Efthymiadis, Sam Devlin, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be given to agents, an important problem that arises is how can agents deal with erroneous knowledge and what is the impact to their behavior both in a single-as well as a multi-agent setting where agents are faced with conflicting goals. Previous research demonstrated the use of plan-based reward shaping with knowledge revision in a single agent scenario where agents showed that they can quickly identify and revise erroneous knowledge and thus benefit from more accurate plans. Moving to a multi-agent setting the use of individual plans as a source of reward shaping has not been as successful due to the agents' conflicting goals. In this paper we present the use of MDPs as a method to provide heuristic knowledge coupled with a revision algorithm to manage the cases where the provided domain knowledge is wrong. We show how agents can deal with erroneous knowledge in the single agent case and how this method can be used in a multi-agent environment for conflict resolution.

Original languageEnglish
Title of host publication13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2014
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages1535-1536
Number of pages2
Volume2
ISBN (Electronic)9781634391313
Publication statusPublished - 2014
Event13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2014 - Paris, France
Duration: 5 May 20149 May 2014

Conference

Conference13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2014
Country/TerritoryFrance
CityParis
Period5/05/149/05/14

Keywords

  • Knowledge revision
  • Reinforcement learning
  • Reward shaping

Cite this