Reinforcement Learning: Expected SARSA

A short introduction on Expected SARSA and comparing SARSA, Expected SARSA, and Q-Learning.

Renu Khandelwal

--

Good to Know:

Reinforcement Learning: Temporal Difference Learning

Reinforcement Learning: SARSA and Q-Learning

Reinforcement Learning aims for an agent to find an optimal control policy for a sequential decision problem in an environment that maximizes its long-term reward by continually interacting with its environment. Interaction of the agent with its environment changes the environment’s state, and as a result, the agent receives a numerical reward from the environment.

Source: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

SARSA, Expected SARSA, and Q-Learning are all Temporal Difference algorithms

TD methods learn directly from raw experience by interacting with the Environment but without a model of the Environment’s dynamics. Each time the agent takes action, the resulting feedback updates estimates of its state-action value function, predicting the long-term discounted reward it will receive if it takes a given action in a particular state. TD is fully incremental by learning before knowing the final outcome.

Expected SARSA is a variation on SARSA, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected SARSA can be on-policy and off-policy.

Expected SARSA can be off-policy, like Q-learning, where there are two different policies, behavior and target policy. An exploratory behavioral policy explores the environment to ensure it gathers sufficient diverse sample data, and a target policy is learned, which is optimized.

Expected SARSA can be an on-policy like SARSA where the behavioral and target policies are identical, where the target policy is iteratively improved as the same policy is used to control the agent’s behavior.

Expected SARSA implements the Bellman equation like SARSA. However, Q-learning implements the Bellman optimality equation.

--

--

Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!