Reinforcement Learning: Monte Carlo Method

An easy-to-understand explanation of the Monte-Carlo method for Reinforcement learning

Renu Khandelwal
6 min readSep 22, 2022

In Reinforcement learning, the learner or the decision maker, called the Agent, constantly interacts with its Environment by performing actions sequentially at each discrete time step. Interaction of the Agent with its Environment changes the Environment's state, and as a result, the Agent receives a numerical reward from the Environment.

The sole objective of the Agent is to maximize the total reward it receives over the long run.

Source: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

The Agent generates a sequence or trajectories of state, action, and reward over a period of time.

A probability distribution, P(s`| s, a) represents the probability of passing from one state(s) to another(s`) when taking action a. The transition probability T(s, a, s`) specifies the probability of ending up in state "s`" when taking action "a" in the state "s".

Model-based approaches like the Markov Decision Process(MDP) use the model of Environment. The model represents Environment dynamics with state transition and reward functions.

If you want to predict the weather or the price of a stock, these predictions depend on a variety of environmental factors or

--

--

Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!