TY - CONF
T1 - Improving Optimistic Exploration in Model-Free Reinforcement Learning
AU - Grzes, Marek
AU - Kudenko, Daniel
PY - 2009
Y1 - 2009
N2 - The key problem in reinforcement learning is the exploration-exploitation tradeoff. An optimistic initialisation of the value function is a popular RL strategy. The problem of this approach is that the algorithm may have relatively low performance after many episodes of learning. In this paper, two extensions to standard optimistic exploration are proposed. The first one is based on different initialisation of the value function of goal states. The second one which builds on the previous idea explicitly separates propagation of low and high values in the state space. Proposed extensions show improvement in empirical comparisons with basic optimistic initialisation. Additionally, they improve anytime performance and help on domains where learning takes place on the sub-space of the large state space, that is, where the standard optimistic approach faces more difficulties.
AB - The key problem in reinforcement learning is the exploration-exploitation tradeoff. An optimistic initialisation of the value function is a popular RL strategy. The problem of this approach is that the algorithm may have relatively low performance after many episodes of learning. In this paper, two extensions to standard optimistic exploration are proposed. The first one is based on different initialisation of the value function of goal states. The second one which builds on the previous idea explicitly separates propagation of low and high values in the state space. Proposed extensions show improvement in empirical comparisons with basic optimistic initialisation. Additionally, they improve anytime performance and help on domains where learning takes place on the sub-space of the large state space, that is, where the standard optimistic approach faces more difficulties.
UR - http://www.scopus.com/inward/record.url?scp=78650724420&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04921-7_37
DO - 10.1007/978-3-642-04921-7_37
M3 - Paper
SP - 360
EP - 369
ER -