TY - CONF
T1 - Robustness Analysis of SARSA(lambda)
T2 - Different Models of Reward and Initialisation
AU - Grzes, Marek
AU - Kudenko, Daniel
PY - 2008
Y1 - 2008
N2 - In the paper the robustness of SARSA(¿), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of ¿, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(¿) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.
AB - In the paper the robustness of SARSA(¿), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of ¿, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(¿) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.
UR - http://www.scopus.com/inward/record.url?scp=52149088151&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-85776-1_13
DO - 10.1007/978-3-540-85776-1_13
M3 - Paper
SP - 144
EP - 156
ER -