Create an approximation of the value function U_{\phi} using Approximate Value Function, and use Policy Gradient to optimize an monte-carlo tree search policy

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?