Reinforcement Learning: Monte Carlo Method
An easy-to-understand explanation of the Monte-Carlo method for Reinforcement learning
In Reinforcement learning, the learner or the decision maker, called the Agent, constantly interacts with its Environment by performing actions sequentially at each discrete time step. Interaction of the Agent with its Environment changes the Environment's state, and as a result, the Agent receives a numerical reward from the Environment.
The sole objective of the Agent is to maximize the total reward it receives over the long run.
The Agent generates a sequence or trajectories of state, action, and reward over a period of time.
A probability distribution, P(s`| s, a) represents the probability of passing from one state(s) to another(s`) when taking action a. The transition probability T(s, a, s`) specifies the probability of ending up in state "s`" when taking action "a" in the state "s".
Model-based approaches like the Markov Decision Process(MDP) use the model of Environment. The model represents Environment dynamics with state transition and reward functions.
If you want to predict the weather or the price of a stock, these predictions depend on a variety of environmental factors or…