Deep Q Learning: A Deep Reinforcement Learning Algorithm
An easy-to-understand explanation of Deep Q-Learning with PyTorch code implementation
--
Deep Reinforcement Learning
Deep Reinforcement Learning combines Reinforcement Learning algorithms with Artificial Neural Networks. It allows the agent to learn an optimal policy for sequential decision problems by maximizing the cumulative future reward.
Deep RL = RL Algorithm + Artificial Neural Network
Reinforcement Learning is a goal-oriented algorithm where the agent aims to find optimal actions for a given state in an environment to maximize the long-term reward.
A Policy is a mapping of the states to actions that define the behavior of an agent in an environment. However, an Optimized Policy is a policy in which the agent is trained to maximize the cumulative reward over time.
The goal of reinforcement learning is to find an optimized policy that maps optimal actions the agent takes for the different environmental states.
The policies can be represented using a lookup table, linear functions, or neural networks depending on the complexity of the action space and state space for the environment. An optimal policy is derived by selecting the highest valued action in each state.
For a low-dimensional state space and action space, a lookup table like a Q-table might be an excellent choice to represent a policy; however, when we have high-dimensional state spaces and action spaces, a neural network might be the option to learn optimized policies.
Neural Networks are several layers of nodes that approximate the function represented by the input data. Hence, a neural network can approximate a value function or a policy function to map states to values or state-action pairs to Q values.
Deep Q Learning
Deep Q learning is a deep Q-network(DQN) that combines Q learning Reinforcement Learning with a deep neural network.
DQN = Q Learning + Artificial Neural Network