Partially observable environments pose a major challenge to the application of reinforcement learning algorithms. In such environments, due to the Markov property frequently being violated in the system state representation, situations can occur where an agent has insufficient information to decide on the optimal action. In such cases, it is necessary to determine when information gathering actions should be executed, that is, when the agent needs to reduce uncertainty about the current state before deciding on how to act. One possible solution that has been proposed in past research is to manually code rules for execution of information gathering actions in the policy using heuristic (and likely faulty) knowledge. However such a solution requires explicit expert knowledge about actions which are information gathering.
In this paper a flexible solution is proposed which automatically learns when to execute information gathering actions and furthermore to automatically discover which actions gather information. We present an evaluation in the RoboCup KeepAway domain that empirically shows the robustness of the proposed approach and its success in learning under varying degrees of partial observability. Hence, it eliminates the need for hand-coded rules, is flexible in different situations and does not require knowledge about information gathering actions.
|Title of host publication||2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2|
|Editors||R BaezaYates, B Berendt, E Bertino, EP Lim, G Pasi|
|Place of Publication||LOS ALAMITOS|
|Publisher||IEEE Computer Society|
|Number of pages||8|
|Publication status||Published - 2009|
|Event||IEEE/WIC/ACM International Conferences on Web Intelligence (WI)/Intelligent Agent Technologies (IAT), - Milan|
Duration: 15 Sept 2009 → 18 Sept 2009
|Conference||IEEE/WIC/ACM International Conferences on Web Intelligence (WI)/Intelligent Agent Technologies (IAT),|
|Period||15/09/09 → 18/09/09|
- Belief state
- partial observability
- reinforcement learning