Generative / Causal / Hierarchical Model-Based Reinforcement Learning
Category: Machine Intelligence
<!-- gdoc-inlined -->
1. Core Curriculum
- (Learning pathway that will lead to understanding the major approaches)
- Experiment Ideas
- Experiments I should run that would improve capabilities in the space, understanding of the space
- Philosophy
- The reasoning behind the relative importance of this approach
- Papers & Books worth Reading
- Papers, organized by lab or by sub-topic or whatever.
Core Curriculum
Reinforcement Learning
- Sutton & Barto. Ch. 1, 2, 3, 6 and 9.
- Bertsekas. Dynamic Programming and Optimal Control.
- Reinforcement Learning of motor skills with Policy Gradients
Model-Based Reinforcement Learning
- Value Iteration Networks
- World Models
- On Learning to Think
- Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
- Unsupervised Predictive Memory in a Goal-Directed Agent
Hierarchical Reinforcement Learning
- FeUdal Networks for Hierarchical Reinforcement Learning
Generative Modeling
- Tutorial on Variational Autoencoders
Causality
- Pearl. Causality. Ch. 3, 4, 7 and 8.
- Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution
- Reinforcement Learning and Causal Models
- Learning Graphs
- Learning Deep Generative Models of Graphs
- Grammar VAE
- Woulda, Shoulda, Coulda: Counterfactually-Guided Policy Search
Honglak’s Talk Generative World Models http://www.unofficialgoogledatascience.com/2017/01/causality-in-machine-learning.html NIPS Causality Workshop
Experiment Ideas
- Use counterfactuals to learn causal relationships in a world-models style simulation of the environment.
- Potential Collaborators:
- David Ha
- Juergen Schmidhuber
- Honglak Lee
- Ashish Viswani?
- Daniel Galvez (Wrote this)
- Imaginative Agents Sync
- Danijar Hafner
- Jacob Buckman
- Eugene Brevdo
- Jakob Uszkoreit
- Potential Collaborators:
- Use Grammar VAE (or other generative graph model) to generate a causal graph over a latent representation of the causal interactions between actions and the environment. Iteratively update your causal graph, as well as your decision making / learning over that graph.
Philosophy
This is a path to general problem solving. Simulation-based planning (especially after integrating causality) allows the use of a model of the world to make predictions about what set of actions will lead to a desired outcome, and then after taking said actions get feedback on the quality of the model of the world.
Hierarchical, model-based planning Counterfactuals
Notes
Ways to represent a world model:
- Latent Variable State Space Model
- Next Frame / Continuous Control prediction (network)
- RNN Cell / Hidden State as Model
- Input Embedding
- VAE Hidden State
At Brain, talk to:
- Aurko Roy
- Arvind Neelakantan
- Ashish Vaswani
- David Ha Papers, By Lab:
Goal: Turn papers on research frontier into a shortlist of methods for building up a model of the environment in reinforcement learning.
Papers
Brain
- Unsupervised Learning for Physical Interaction through Video Prediction [Also, Robotics]
- https://arxiv.org/pdf/1605.07157.pdf
- Continuous Deep Q-Learning with Model-based Acceleration
- http://proceedings.mlr.press/v48/gu16.pdf
- Value Prediction Network
- https://arxiv.org/pdf/1707.03497.pdf
- Learning to Generate Long-term Future via Hierarchical Prediction
- https://arxiv.org/pdf/1704.05831.pdf
- Discrete Sequential Prediction of Continuous Actions for Deep RL
- https://arxiv.org/pdf/1705.05035.pdf
- Deep Visual Foresight for Planning Robot Motion [Also, Robotics]
- https://arxiv.org/pdf/1610.00696.p
- Stochastic Variational Video prediction
- https://arxiv.org/pdf/1710.11252.pdf
- Geometry-Based Next Frame Prediction from Monocular Video
- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45984.pdf
- Decomposing Motion and Content for Natural Video Sequence Prediction
- https://sites.google.com/a/umich.edu/rubenevillegas/iclr2017
- Action-Conditional Video Prediction using Deep Networks in Atari Games
- http://papers.nips.cc/paper/5859-action-conditional-video-prediction-using-deep-networks-in-atari-games.pdf
- World Models
- https://arxiv.org/pdf/1803.10122.pdf
Deepmind
- Learning Model-Based Planning from Scratch [Also, Planning]
- https://arxiv.org/pdf/1707.06170.pdf
- Recurrent Environment Simulators
- https://arxiv.org/pdf/1704.02254.pdf
- Structure Learning in Motor Control: A Deep Reinforcement Learning Model [Also Transfer, Intuitive Physics]
- https://arxiv.org/pdf/1706.06827.pdf
- Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
- https://arxiv.org/abs/1707.06203
- Continuous Deep Q-Learning with Model-based Acceleration
- https://arxiv.org/abs/1603.00748
- Skip Context Tree Switching
- http://proceedings.mlr.press/v32/bellemare14.pdf
- Bayes-Adaptive Simulation-Based Search with Value Function Approximation
- http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/bafa.pdf
- Learning and Querying Fast Generative Models for Reinforcement Learning
- https://arxiv.org/abs/1802.03006
- Learning Model-Based Planning from Scratch
- https://arxiv.org/abs/1707.06170
- Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
- https://arxiv.org/pdf/1506.07365.pdf
Berkeley
- Neural Network Dynamics for Model-based Deep Reinforcement Learning with Model-Free Tuning
- https://arxiv.org/pdf/1708.02596.pdf
- Model-Based Reinforcement Learning with NEural Network Dynamics
- http://bair.berkeley.edu/blog/2017/11/30/model-based-rl/
- Self-Supervised Visual Planning with Temporal Skip Connections
- https://arxiv.org/abs/1710.05268
- Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
- https://arxiv.org/abs/1703.03078
- Deep Spatial Autoencoders for Visuomotor Learning
- http://rll.berkeley.edu/dsae/dsae.pdf
- End-to-End Training of Deep Visuomotor Policies
- http://jmlr.org/papers/v17/15-522.html
Other
- Bayesian Model-Based RL
- https://arxiv.org/pdf/1609.04436.pdf
Causality
Types of Causality
- Counterfactual Simulation
- Hierarchical Forward Prediction
- Time Series Relation + Relationships relative to trends / controls
- Randomized Controlled Trial
- Pseudo-experiments
- Differences between groups that can be controlled for
- Attribution
- Probabilistic, Manipulative, Counterfactual and Structural Approaches
Papers
- On Causal and Anticausal Learning
- Scholkopf.
- Imagination-Augmented Agents for Deep Reinforcement Learning [Also, Planning]
- https://arxiv.org/abs/1707.06203
- Bandits with Unobserved Confounders: A Causal Approach
- http://ftp.cs.ucla.edu/pub/stat_ser/r460.pdf
- Markov Decision Processes with Unobserved Confounders: A Causal Approach
- https://www.cs.purdue.edu/homes/eb/mdp-causal.pdf
- Recurrent Environment Simulators
- https://arxiv.org/pdf/1704.02254.pdf
- Learning Model-Based Planning from Scratch [Also, Planning]
- https://arxiv.org/pdf/1707.06170.pdf
- Learning Plannable Representations with Causal InfoGAN
- https://arxiv.org/pdf/1807.09341.pdf
- Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution
- https://arxiv.org/pdf/1801.04016.pdf
Recent Papers (Non-RL)
- Learning Representations for Counterfactual Inference
- Deep IV: A Flexible Approach for Counterfactual Prediction
- Causal Learning and Explainations of Deep Neural Networks via Autoencoded Activations
- Structure Agnostic Model, Causal Discovery, and Penalized Adversarial Learning
- CausalGAN: Learning Implicit Causal Generative Models with Adversarial Training
- Discovering Causal Signals in Images
- Causal Generative Neural Networks
- Transfer Learning for Estimating Causal Effects using Neural Networks
- Discovering Context Specific Causal Relationships
- The Deconfounded Recommender: A Causal Inference Approach to Recommendation
- The Blessings of Multiple Causes
- Theoretical Impediments to Machine Learning
- An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems
- Topological Causality in Dynamical Systems
- Recognising Top-Down Causation
- Scalable Linear Causal Inference for Irregularly Sampled Time Series with Long Range Dependencies
Sites and Presentations
- Causal Inference in Statistics
- Judea Pearl
- Causal Inference in Machine Learning
- Ricardo Silva
- Stanford Encyclopedia of Philosophy - Causal Models
- ICML Causal Inference Tutorial
- Uri Shalit & David Sontag
- Causality in Machine Learning
- Omkar Muralidharan, Niall Cardin
- Deep Learning Patterns
Books
- The Direction of Time
- Reichenbach.
- Causality: Models, Reasoning and Inference.
- Judea Pearl
Thoughts
- Simulating the world by predicting the next input over a time series across a hierarchy provides tremendous amounts of supervised data for building a causal world model.
- The question of whether an action causes an outcome can be answered by consulting the world model with and without the action.
- The hierarchical structure allows you to deal with the instability that comes with treating your incremental predictions as true and making more predictions as a function of them.
- This also will solve the common-sense knowledge problem.
Source: Original Google Doc