An Introduction to Markov Decision Process

The memoryless Markov Decision Process predicts the next state based only on the current state and not the previous one.

Renu Khandelwal
7 min readSep 13, 2022

Google’s PageRank developed by Sergey Brin and Larry Page is based on a Markov Decision Process(MDP) utlizing the Markov chains making it the most used applications of a MDP.

What is MDP?

Markov Decision Process(MDP) is a mathematical framework for sequential decision and a dynamic optimization method in a stochastic discrete control process.

Markovian property is a memoryless property of a stochastic process where the future is independent of the past and is only based on the current state, as proposed by Andrei Markov.

Components of MDP

The learner or the decision maker, called the Agent, interacts continually with its Environment by performing actions sequentially at each discrete time step. Interaction of the Agent with its Environment changes the Environment's state, and as a result, the Agent receives a numerical reward from the Environment.

Source: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto

--

--

Renu Khandelwal
Renu Khandelwal

Written by Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!

Responses (2)