Reinforcement Learning: Expected SARSA
A short introduction on Expected SARSA and comparing SARSA, Expected SARSA, and Q-Learning.
Good to Know:
Reinforcement Learning: Temporal Difference Learning
Reinforcement Learning: SARSA and Q-Learning
Reinforcement Learning aims for an agent to find an optimal control policy for a sequential decision problem in an environment that maximizes its long-term reward by continually interacting with its environment. Interaction of the agent with its environment changes the environment’s state, and as a result, the agent receives a numerical reward from the environment.
SARSA, Expected SARSA, and Q-Learning are all Temporal Difference algorithms
TD methods learn directly from raw experience by interacting with the Environment but without a model of the Environment’s dynamics. Each time the agent takes action, the resulting feedback updates estimates of its state-action value function, predicting the long-term discounted reward it will receive if it takes a given action in a particular state. TD is fully incremental by learning before knowing the final outcome.
Expected SARSA is a variation on SARSA, the…