Title
A theoretical and empirical analysis of expected sarsa
Author
van Seijen, H.H.
van Hasselt, H.
Whiteson, S.
Wiering, M.
TNO Defensie en Veiligheid
Publication year
2009
Abstract
This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic onpolicy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsa's updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods. © 2009 IEEE.
Subject
Electronics
Behavior policy
Difference method
Empirical analysis
Higher learning
Learning rates
Model free
Multiple domains
Q-learning
Stochasticity
Zero variance
Dynamic programming
Learning algorithms
Reinforcement
Reinforcement learning
Systems engineering
Education
To reference this document use:
http://resolver.tudelft.nl/uuid:0e868206-5237-42d0-8bc0-3deb94a679ef
DOI
https://doi.org/10.1109/adprl.2009.4927542
TNO identifier
279985
Publisher
IEEE, Piscataway, NJ
Source
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009, 30 March - 2 April 2009, Nashville, TN, USA, 177-184
Document type
conference paper