Member-only story
Unlock the secrets of DDPG in Reinforcement Learning
A simple step-by-step explanation of Deep Deterministic Policy Gradient(DDPG) in RL
How would you train a robotic arm to grasp objects or for locomotion which is a continuous control problem with continuous states and continuous actions?
What is a continuous control problem in RL?
A continuous control problem in RL is where an agent needs to take actions from a continuous action space is much more complex than the discrete action spaces.
In discrete action spaces, the agent has a limited number of actions to select; however, in the case of continuous control problems, the agent has to select from a much larger and often infinite number of actions making the optimal action selection a complex problem.
Examples of continuous control problems are
- Robotics,
- Autonomous Driving, and
- Finance.
What are the RL algorithms that solve continuous control problems?
The RL algorithms to solve the continuous control problems are popularly based on policy gradients where the agent learns a policy that maps states to actions directly like
- Deep Deterministic Policy Gradient(DDPG)
- Proximal Policy Gradient(PPO)
- Trust Region Policy Optimization(TRPO) or
- Soft Actor-Critic(SAC)
This article will explore DDPG
Deep Deterministic Policy Gradient(DDPG) is the model-free, off-policy deep reinforcement algorithm inspired by Deep Q-Network and is based on Actor-Critic using Policy Gradient
let’s understand each of the terms that make up DDPG.
What does the term deterministic policy mean in DDPG?
The term deterministic means there is no randomness or variability in a deterministic system's output, which contrasts with the Stochastic policy.
The Deterministic policy means that…