Reinforcement Learning in RoboCup KeepAway with Partial Observability

Research output: Contribution to conferencePaperpeer-review


Partially observable environments pose a major challenge to the application of reinforcement learning algorithms. In such environments, due to the Markov property frequently being violated in the system state representation, situations can occur where an agent has insufficient information to decide on the optimal action. In such cases, it is necessary to determine when information gathering actions should be executed, that is, when the agent needs to reduce uncertainty about the current state before deciding on how to act. One possible solution that has been proposed in past research is to manually code rules for execution of information gathering actions in the policy using heuristic (and likely faulty) knowledge. However, such a solution requires explicit expert knowledge about actions which are information gathering. In this paper a flexible solution is proposed which automatically learns when to execute information gathering actions and furthermore to automatically discover which actions gather information. We present an evaluation in the Robo{C}up Keep{A}way domain that empirically shows the robustness of the proposed approach and its success in learning under varying degrees of partial observability. Hence, it eliminates the need for hand-coded rules, is flexible in different situations and does not require knowledge about information gathering actions.
Original languageUndefined/Unknown
Publication statusPublished - 2009

Cite this