visual prediction task Given n frames of video x_{1}, …, x_{n}, predict T subsequent frames x_{n+1}… x_{n+T}. Action Conditioned Prediction Network Visual prediction task. Two inputs: taken action image Predict the next t frame. Challenge! What is the action? Predicting the action is not super easy. RAFI Conditional Flow Matching with Video Generation. https://arxiv.org/pdf/2406.14436

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?