Generalized Policy Iteration
Learn about Generalized Policy Iteration, Value iteration, and Policy Iteration
Reinforcement learning is where the learner or the decision maker, called the Agent, interacts continually with its Environment by performing actions sequentially at each discrete time step. Interaction of the Agent with its Environment changes the Environment’s state, and as a result, the Agent receives a numerical reward from the Environment.
The goal of reinforcement learning is for an Agent to find an optimal policy that maximizes the long-term reward as the Environment will reward the desired actions in particular states and penalize the undesired actions in certain states.
A policy is the learning agent’s strategy to achieve the maximum reward from an Environment. The policy is the behavior of an Agent in an Environment at a given time.
The Agent uses the policy to decide what action to perform when the Environment is in a specific state. The policy is like a map for the Agent in an Environment to reach the desired goal.
Agent’s policy changes due to its experience while exploring and exploiting the Environment.
The policy may not always be the optimal route to reach the desired end state.