Machine Intelligence
- Bias Variance vs. Variance, Bias Variance vs. Sensitivity-Specificity
- Metalearning the Structure of Information
- Deepmind's Path to Neuro-Inspired General Intelligence
- High-Level Machine Intelligence Reading Group
- Abstract Representation Learning
- Google Brain Research Overview
- Criticisms of Machine Learning / Deep Learning
- Generative / Causal / Hierarchical Model-Based Reinforcement Learning
- Against Embodiment, Intuitive Physics & Neuroinspiration
- Deepmind Research Overview
- Machine Intelligence Research Frontier
- Future of Machine Intelligence
- Issues in ML Research
- Relative Safety of Forms of Machine Intelligence
- Deep Learning Frameworks
- Interesting Facts in Machine Learning
- Open AI Research Breakdown
- Multi-Agent
- Recursive Self-Improvement, Task Search, AI-GAs, Powerplay [Beneficial AGI]
- NeuroInspiration / Deep RL
- Multi-Agent Conversation Notes
- Facebook AI Research Overview
- Relative Safety of Paths to General Intelligence
<!-- gdoc-inlined -->
Bias variance is one instantiation of Occam’s razor.
Point 1: Bias Variance Tradeoff and the variance of a probability distribution Variance in the bias-variance tradeoff refers to the concept that when you’re searching over models, some models have more flexibility. When they fit a dataset, models with more flexibility tend to overfit, because they find a separating hyperplane that is overly accommodating to particular datapoint. There are many ways to tend to overfit, and variance is an abstraction over all of them.
- There’s valuing fit over smoothness.
- There’s valuing a single datapoint in a region with sparse data over the impact from other datapoint farther away that you can interpolate from or extrapolate from. (Looking too much at particular datapoints)
- Arbitrarily overweighting one representation of the features over valuable others, incomplete search over the set of feature representations
Related: Decomposition over Regularization https://docs.google.com/document/d/1tCoaZEzERE3XP_4SzJWJhQ17bnY7vfUGYPGnCCEO54I/edit?usp=sharing
Levels of Abstraction, Abstracting Over an Incomplete Subset https://docs.google.com/document/d/18FvL9mlKTDlxQVXju1v8vOV63U73-6f9WVGh9-d8ScE/edit?usp=sharing
Treating Variance in the bias-variance tradeoff as a concept, there are many ways we could instantiate it.
- The standard way, watching your model overfit. This approximation of variance is the difference between the training error and the validation error. (Bias will affect both your training and validation error equally)
- Bootstrap sampling variants of the dataset
- Split between in and out of bag examples
- Train on the training sample, test on the testing sample
- Variance is the ordinary (distribution) variance of your predictions on a given datapoint (assuming regression). You can compute the average variance across datapoints for your model’s variance.
The number of hypotheses that can be learned by a model (say, the number of features in a dataset for a decision stump, and all of their interactions for a singly branched tree) Across different representations of a hypothesis space (parameters, freedom over those parameters, number of parameters, rules, freedom of rules) these are different approximations of the variance. But a wide hypothesis space tends to cause high variance, it’s not variance itself.
Say that your model’s predictions of a datapoint are Cauchy distributed. Would you say that since its variance is undefined, it’s not subject to the bias-variance tradeoff?
Variance of a distribution
Variance is complex hypothesis classes leading to overfitting
Just because a concept is formalizable doesn’t mean that the concept is its formalization. There’s something like map-territory here. But it’s higher-map lower-map. We need a clean way to distinguish between concepts and their formalization. Would you say that ‘attention’ in deep learning is attention? Of course not. Attention is so much bigger than that.
Source: Original Google Doc