Reinforcement Learning: SARSA and Q-Learning

Similarities and Differences between SARSA and Q-Learning

Renu Khandelwal
5 min readOct 14, 2022

Reinforcement Learning aims for an agent to find optimal actions in an environment that maximizes its long-term reward by continually interacting with its environment. Interaction of the agent with its environment changes the environment’s state, and as a result, the agent receives a numerical reward from the environment. The optimal actions that the Agent takes are referred to as a policy.

Source: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

Temporal Difference

Temporal Difference is one idea central and novel to Reinforcement learning for online prediction. Temporal Difference (TD) derives its name from using time differences to predict a measure of the total amount of reward expected over the future.

TD calculates the value at every step and is fully incremental(source:https://www.cs.upc.edu/~mmartin/Ag5-4x.pdf)

TD methods learn directly from raw experience by interacting with the Environment but without a model of the Environment’s dynamics. TD agent updates the values at every step and is fully incremental by learning before knowing the…

--

--

Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!