Generative / Causal / Hierarchical Model-Based Reinforcement Learning

Category: Machine Intelligence

Read the original document

1. Core Curriculum

(Learning pathway that will lead to understanding the major approaches)
Experiment Ideas
1. Experiments I should run that would improve capabilities in the space, understanding of the space
Philosophy
1. The reasoning behind the relative importance of this approach
Papers & Books worth Reading
1. Papers, organized by lab or by sub-topic or whatever.

Core Curriculum

Reinforcement Learning

Sutton & Barto. Ch. 1, 2, 3, 6 and 9.
Bertsekas. Dynamic Programming and Optimal Control.
Reinforcement Learning of motor skills with Policy Gradients

Model-Based Reinforcement Learning

Value Iteration Networks
World Models
On Learning to Think
Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
Unsupervised Predictive Memory in a Goal-Directed Agent

Hierarchical Reinforcement Learning

FeUdal Networks for Hierarchical Reinforcement Learning

Generative Modeling

Tutorial on Variational Autoencoders

Causality

Pearl. Causality. Ch. 3, 4, 7 and 8.
Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution
Reinforcement Learning and Causal Models
Learning Graphs
1. Learning Deep Generative Models of Graphs
2. Grammar VAE
Woulda, Shoulda, Coulda: Counterfactually-Guided Policy Search

Honglak’s Talk Generative World Models http://www.unofficialgoogledatascience.com/2017/01/causality-in-machine-learning.html NIPS Causality Workshop

Experiment Ideas

Use counterfactuals to learn causal relationships in a world-models style simulation of the environment.
1. Potential Collaborators:
  1. David Ha
  2. Juergen Schmidhuber
  3. Honglak Lee
  4. Ashish Viswani?
  5. Daniel Galvez (Wrote this)
  6. Imaginative Agents Sync
    1. Danijar Hafner
    2. Jacob Buckman
    3. Eugene Brevdo
    4. Jakob Uszkoreit
Use Grammar VAE (or other generative graph model) to generate a causal graph over a latent representation of the causal interactions between actions and the environment. Iteratively update your causal graph, as well as your decision making / learning over that graph.

Philosophy

This is a path to general problem solving. Simulation-based planning (especially after integrating causality) allows the use of a model of the world to make predictions about what set of actions will lead to a desired outcome, and then after taking said actions get feedback on the quality of the model of the world.

Hierarchical, model-based planning Counterfactuals

Notes

Ways to represent a world model:

Latent Variable State Space Model
Next Frame / Continuous Control prediction (network)
RNN Cell / Hidden State as Model
Input Embedding
VAE Hidden State

At Brain, talk to:

Aurko Roy
Arvind Neelakantan
Ashish Vaswani
David Ha Papers, By Lab:

Goal: Turn papers on research frontier into a shortlist of methods for building up a model of the environment in reinforcement learning.

Papers

Brain

Unsupervised Learning for Physical Interaction through Video Prediction [Also, Robotics]
1. https://arxiv.org/pdf/1605.07157.pdf
Continuous Deep Q-Learning with Model-based Acceleration
1. http://proceedings.mlr.press/v48/gu16.pdf
Value Prediction Network
1. https://arxiv.org/pdf/1707.03497.pdf
Learning to Generate Long-term Future via Hierarchical Prediction
1. https://arxiv.org/pdf/1704.05831.pdf
Discrete Sequential Prediction of Continuous Actions for Deep RL
1. https://arxiv.org/pdf/1705.05035.pdf
Deep Visual Foresight for Planning Robot Motion [Also, Robotics]
1. https://arxiv.org/pdf/1610.00696.p
Stochastic Variational Video prediction
1. https://arxiv.org/pdf/1710.11252.pdf
Geometry-Based Next Frame Prediction from Monocular Video
1. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45984.pdf
Decomposing Motion and Content for Natural Video Sequence Prediction
1. https://sites.google.com/a/umich.edu/rubenevillegas/iclr2017
Action-Conditional Video Prediction using Deep Networks in Atari Games
http://papers.nips.cc/paper/5859-action-conditional-video-prediction-using-deep-networks-in-atari-games.pdf
World Models
https://arxiv.org/pdf/1803.10122.pdf

Deepmind

Learning Model-Based Planning from Scratch [Also, Planning]
1. https://arxiv.org/pdf/1707.06170.pdf
Recurrent Environment Simulators
1. https://arxiv.org/pdf/1704.02254.pdf
Structure Learning in Motor Control: A Deep Reinforcement Learning Model [Also Transfer, Intuitive Physics]
1. https://arxiv.org/pdf/1706.06827.pdf
Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
1. https://arxiv.org/abs/1707.06203
Continuous Deep Q-Learning with Model-based Acceleration
1. https://arxiv.org/abs/1603.00748
Skip Context Tree Switching
1. http://proceedings.mlr.press/v32/bellemare14.pdf
Bayes-Adaptive Simulation-Based Search with Value Function Approximation
1. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/bafa.pdf
Learning and Querying Fast Generative Models for Reinforcement Learning
1. https://arxiv.org/abs/1802.03006
Learning Model-Based Planning from Scratch
1. https://arxiv.org/abs/1707.06170
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
https://arxiv.org/pdf/1506.07365.pdf

Berkeley

Neural Network Dynamics for Model-based Deep Reinforcement Learning with Model-Free Tuning
1. https://arxiv.org/pdf/1708.02596.pdf
Model-Based Reinforcement Learning with NEural Network Dynamics
1. http://bair.berkeley.edu/blog/2017/11/30/model-based-rl/
Self-Supervised Visual Planning with Temporal Skip Connections
1. https://arxiv.org/abs/1710.05268
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
1. https://arxiv.org/abs/1703.03078
Deep Spatial Autoencoders for Visuomotor Learning
1. http://rll.berkeley.edu/dsae/dsae.pdf
End-to-End Training of Deep Visuomotor Policies
1. http://jmlr.org/papers/v17/15-522.html

Other

Bayesian Model-Based RL
1. https://arxiv.org/pdf/1609.04436.pdf

Causality

Types of Causality

Counterfactual Simulation
1. Hierarchical Forward Prediction
Time Series Relation + Relationships relative to trends / controls
Randomized Controlled Trial
1. Pseudo-experiments
2. Differences between groups that can be controlled for
Attribution
Probabilistic, Manipulative, Counterfactual and Structural Approaches

Papers

On Causal and Anticausal Learning
1. Scholkopf.
Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
1. https://arxiv.org/abs/1707.06203
Bandits with Unobserved Confounders: A Causal Approach
1. http://ftp.cs.ucla.edu/pub/stat_ser/r460.pdf
Markov Decision Processes with Unobserved Confounders: A Causal Approach
1. https://www.cs.purdue.edu/homes/eb/mdp-causal.pdf
Recurrent Environment Simulators
1. https://arxiv.org/pdf/1704.02254.pdf
Learning Model-Based Planning from Scratch [Also, Planning]
1. https://arxiv.org/pdf/1707.06170.pdf
Learning Plannable Representations with Causal InfoGAN
1. https://arxiv.org/pdf/1807.09341.pdf
Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution
1. https://arxiv.org/pdf/1801.04016.pdf