Abstract
PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm performs near optimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a framework to use partial knowledge about effects of actions in a theoretically well-founded way. Empirical evaluation shows that our proposed method is more efficient than reward shaping which represents an alternative approach to incorporate background knowledge. Our solution is also very competitive when compared with the Bayesian Exploration Bonus (BEB) algorithm. BEB is not PAC-MDP, however it can exploit domain knowledge via informative priors. We show how to use the same kind of knowledge in the PAC-MDP framework in a way which preserves all theoretical guarantees of PAC-MDP learning.
Original language | English |
---|---|
Title of host publication | Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010) |
Place of Publication | Richland, SC |
Publisher | International Foundation for Autonomous Agents and Multiagent Systems |
Pages | 349-356 |
Number of pages | 8 |
ISBN (Print) | 978-0-9826571-1-9 |
Publication status | Published - 2010 |
Event | 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010) - Toronto, Canada Duration: 10 May 2010 → 14 May 2010 http://www.aamas-conference.org/AAMAS/aamas10/ |
Conference
Conference | 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010) |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 10/05/10 → 14/05/10 |
Internet address |
Keywords
- domain knowledge
- heuristics
- reinforcement learning