Postponed Updates for Temporal-Difference Reinforcement Learning
conference paper
This paper presents postponed updates, a new strategy
for TD methods that can improve sample efficiency without
incurring the computational and space requirements of
model-based RL. By recording the agent’s last-visit experience,
the agent can delay its update until the given state
is revisited, thereby improving the quality of the update.
Experimental results demonstrate that postponed updates
outperforms several competitors, most notably eligibility
traces, a traditional way to improve the sample efficiency
of TD methods. It achieves this without the need to tune an
extra parameter as is needed for eligibility traces.
for TD methods that can improve sample efficiency without
incurring the computational and space requirements of
model-based RL. By recording the agent’s last-visit experience,
the agent can delay its update until the given state
is revisited, thereby improving the quality of the update.
Experimental results demonstrate that postponed updates
outperforms several competitors, most notably eligibility
traces, a traditional way to improve the sample efficiency
of TD methods. It achieves this without the need to tune an
extra parameter as is needed for eligibility traces.
TNO Identifier
328768
Source title
9th International Conference on Intelligent Systems Design and Applications - ISDA'09, November 30 - December 2, 2009, Pisa, Italy
Pages
665-672
Files
To receive the publication files, please send an e-mail request to TNO Repository.