Reinforcement Learning: SARSA and Q-Learning
Similarities and Differences between SARSA and Q-Learning
Reinforcement Learning aims for an agent to find optimal actions in an environment that maximizes its long-term reward by continually interacting with its environment. Interaction of the agent with its environment changes the environment’s state, and as a result, the agent receives a numerical reward from the environment. The optimal actions that the Agent takes are referred to as a policy.
Temporal Difference
Temporal Difference is one idea central and novel to Reinforcement learning for online prediction. Temporal Difference (TD) derives its name from using time differences to predict a measure of the total amount of reward expected over the future.
TD methods learn directly from raw experience by interacting with the Environment but without a model of the Environment’s dynamics. TD agent updates the values at every step and is fully incremental by learning before knowing the…