Robustness Analysis of SARSA(lambda): Different Models of Reward and Initialisation

Marek Grzes, Daniel Kudenko

Research output: Contribution to conferencePaperpeer-review

Abstract

In the paper the robustness of SARSA(¿), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of ¿, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(¿) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.
Original languageUndefined/Unknown
Pages144-156
DOIs
Publication statusPublished - 2008

Cite this