Thinking about advances in the capabilities of RL: Knowledge Discovery -> Reasoning (programming assistance) ->(ongoing)-> Robotics Insight: as time goes on, the “risk-criticality” of our applications increase; yet, as risk critical scenarios increase, its harder to get data. Reliable Feedback Loop General desirable structure… Verify (claims and requirements) => Safeguard (safe continuous deployment) => Generalize (via compositional generalization—incrementing adding behavior without loosing behavior) => Verify => … Deal with Stochasticity An RL algorithm is explicable, if, WHP, running on the same MDP with fixed randomness results in the same outcomes. => \epsilon optimal replicable algorithms for tabular / linear settings with sample complexity polynomial i parameters. Quantization for Tie Break Compositional Generalization We can decompose relevant problems into subparts, and thus allowing us to compose them together into solving new task.