Nov 21, 2022
Hey Alex,
I agree with you!
Google example isn't a great example as the intention was to use it for understanding purposes only. Off Policy uses behavior policy to explore the environment; the target policy is learned and is the optimized policy.
I hope this clears up the confusion.
Renu!