Renu Khandelwal
Nov 21, 2022

--

Hey Alex,

I agree with you!

Google example isn't a great example as the intention was to use it for understanding purposes only. Off Policy uses behavior policy to explore the environment; the target policy is learned and is the optimized policy.

I hope this clears up the confusion.

Renu!

--

--

Renu Khandelwal
Renu Khandelwal

Written by Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!

No responses yet