SU-CS238 OCT192023 — Jemoka Knowledge Base

Key Sequence Notation New Concepts Markov Decision Process value iteration Bellman Residual for continuous state spaces: Approximate Value Function use global approximation or local approximation methods Important Results / Claims policy and utility creating a good utility function / policy from instantaneous rewards: either policy evaluation or value iteration creating a policy from a utility function: value-function policy (“choose the policy that takes the best valued action”) calculating the utility function a policy currently uses: use policy evaluation kernel smoothing value iteration, in practice Questions Interesting Factoids