# Reinforcement Learning: Monte Carlo Method

## An easy-to-understand explanation of the Monte-Carlo method for Reinforcement learning

In Reinforcement learning, the learner or the decision maker, called the **Agent**, constantly interacts with its **Environment** by performing **actions** sequentially at each discrete time step. Interaction of the Agent with its Environment changes the **Environment's state**, and as a result, the Agent receives a numerical **reward** from the Environment.

The sole objective of the Agent is to maximize the total reward it receives over the long run.

The Agent generates a sequence or trajectories of state, action, and reward over a period of time.

A **probability distribution, P(s`| s, a)** represents the probability of passing from one state(s) to another(s`) when taking action a. The t

**ransition probability**specifies the probability of ending up in state "s`" when taking action "a" in the state "s".

*T*(s, a,*s`)*Model-based approaches like the Markov Decision Process(MDP) use the model of Environment. The model represents Environment dynamics with state transition and reward functions.

*If you want to predict the weather or the price of a stock, these predictions depend on a variety of environmental factors or*…