Robustness analysis of SARSA(lambda): Different models of reward and initialisation

Marek Grzes, Daniel Kudenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the paper the robustness of SARSA(lambda), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of lambda, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(lambda) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.

Original languageEnglish
Title of host publicationARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS
EditorsD Dochev, M Pistore, P Traverso
Place of PublicationBERLIN
PublisherSpringer
Pages144-156
Number of pages13
ISBN (Print)978-3-540-85775-4
Publication statusPublished - 2008
EventInternational Conference on Informational Technology and Environmental System Science - Jiaozuo
Duration: 15 May 200817 May 2008

Conference

ConferenceInternational Conference on Informational Technology and Environmental System Science
CityJiaozuo
Period15/05/0817/05/08

Keywords

  • reinforcement learning
  • tern poral-difference learning
  • SARSA
  • Q-learning
  • eligibility traces

Cite this