Abstract
In the paper the robustness of SARSA(lambda), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of lambda, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(lambda) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.
Original language | English |
---|---|
Title of host publication | ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS |
Editors | D Dochev, M Pistore, P Traverso |
Place of Publication | BERLIN |
Publisher | Springer |
Pages | 144-156 |
Number of pages | 13 |
ISBN (Print) | 978-3-540-85775-4 |
Publication status | Published - 2008 |
Event | International Conference on Informational Technology and Environmental System Science - Jiaozuo Duration: 15 May 2008 → 17 May 2008 |
Conference
Conference | International Conference on Informational Technology and Environmental System Science |
---|---|
City | Jiaozuo |
Period | 15/05/08 → 17/05/08 |
Keywords
- reinforcement learning
- tern poral-difference learning
- SARSA
- Q-learning
- eligibility traces