Renu Khandelwal
Dec 11, 2023

--

ChenQian,

Q(s,a) is the expected return of taking an action, refer to the temporal difference

Q(s,a)=r + discounted rate*V(s’)

hope this helps!

--

--

Renu Khandelwal
Renu Khandelwal

Written by Renu Khandelwal

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!

Responses (1)