Reinforcement Learning: Expected SARSA

A short introduction on Expected SARSA and comparing SARSA, Expected SARSA, and Q-Learning.

Renu Khandelwal
5 min readOct 17, 2022

Good to Know:

Reinforcement Learning: Temporal Difference Learning

Reinforcement Learning: SARSA and Q-Learning

Reinforcement Learning aims for an agent to find an optimal control policy for a sequential decision problem in an environment that maximizes its long-term reward by continually interacting with its environment. Interaction of the agent with its environment changes the environment’s state, and as a result, the agent receives a numerical reward from the environment.

Source: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

SARSA, Expected SARSA, and Q-Learning are all Temporal Difference algorithms

TD methods learn directly from raw experience by interacting with the Environment but without a model of the Environment’s dynamics. Each time the agent takes action, the resulting feedback updates estimates of its state-action value function, predicting the long-term discounted reward it will receive if it takes a given action in a particular state. TD is fully incremental by learning before knowing the final outcome.

Expected SARSA is a variation on SARSA, the…

--

--

Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!