Recursive Self-Improvement, Task Search, AI-GAs, Powerplay [Beneficial AGI]

Category: Machine Intelligence

Read the original document

Schedule / Agenda

Questions & Introduction Forms of Recursive Self-Improvement Relative Safety of Approaches

Individual Notes

Joscha Questions:

what’s the minimal policy for recursive self improvement?
if we think in terms of compositional units, what are the message types to communicate rewards, costs, and general compute results?
should we aim for architectures that can jump system boundaries and create meta agents?

minimal starting architecture?

Creative Thoughts:

Notes:

Todor Questions: Stable / increasing growth rate of capabilities vs S-curve of capabilities? In practice does this end up looking different than the move from hand-engineered features + linear models to neural nets

Creative Thoughts:

Notes: Jeremy Questions: What is the space of RSI algorithms? I’ve made a brief attempt, but there’s likely plenty of options missing. Organizing it into levels, understanding the relative importance of each type of self-modification, and more is worthwhile.

Creating a new type of Creating
Creating a new type of Environment
Creating a new type of Task
Creating new Architecture Modules
Creating new Optimizers
Creating the Loss Function
Adding a new Task of an existing type
Modifying the Loss Function
Modifying the Optimizer
Modifying the Ordering of Architecture Modules
Modifying Training Objects
Modifying Training Labels

What is missing?

Question - will this become the dominant paradigm for AGI research? This seems likely to me, as systems become capable of generalization. If yes, how do we address the many safety issues that come from the loss of controllability, interpretability, increase in the speed of takeoff, and more?

Implications of RSI for safety:

I have many thoughts, which in large part have to do with speed of takeoff and interpretability. If I don’t understand my system, I don’t know whether or not it can escape my safety constraints. If I don’t know that training data was used, its capabilities could be arbitrary. The speed of takeoff will likely move between fast and slow levels without my ability to predict it.

This is a runaway process that leads to the reaction of intelligence superior to our own.

The value alignment problem is completely unsolved, because it’s likely that the systems’ values will adapt through its training. It’s insufficient to limit parts, because interactions between parts can be leveraged to confound limitations.

Ivan Questions: Which component of AI training has the most potential to be amplified by recursive self improvement? The optimization procedure, the training data, the architecture, some complex mixture of these?

Why aren’t recursively self-improving systems already state of the art?

What do we expect the phase transition to look like from current hand-tuned systems to future RSI ones?

Creative Thoughts: Is RSI bottlenecked by program synthesis? The “right representation” of SGD is probably not math at all (try representing batching, random shuffling, etc in math notation) but simply code.

All the key difficulties in RSI (representing & generating discrete objects like architectures and algorithms) are special cases of problems in program synthesis.

Safety Implications:

RSI means more blackbox testing, incl. Red teaming and adversarial setups, and less “whitebox” techniques like interpretability, auditing, formal methods etc.
Setups that try to generate safety out of multi-agent setups (like AI safety via debate) gain importance.

Notes:

Conversation Notes Joscha: Interested in fixing his attention. Giving that the attention has been fixed, what is the minimal policy for RSI? It will be amended, but what is the minimal thing you can implement so that everything can be learned? If you intend units to link up with each other, how do they communicate? Different rewards, cost of these units, message types, things that they get rewarded for. Cognitive architectures approach? Should they learn to use arbitrary algorithms, or should they be linked up in a substrate specific way? Meta architectures, reverse engineering itself or that builds functionalityon of it.

Jeremy: What do you mean by units? Joscha: By ‘uints’ he means something like cortical columns. Speciallization only depends on local structure in the environment. Global structure is self-organizing. Uniform initial units. What is the minimal architecture?

Ivan: Which part of the AI system is most important? Could it be in the training data, like Clune claims? Architecture seems obvious. Data is intriguing.

Question - why aren’t recursively self improving systems already state of the art? People don’t seem to be particularly good at this. You may not get to AGI, but you should certainly beat the grad student.

Todor: People do things like figuring out the optimal depth for your algorithm, but i’s not done by grad students. Implicitly people do this architecture fine-tuning.

Todor: In the case of AI specific hardware, we started to see this. Using RL for improving chip design.

Todor: Do we expect to see continuous, stable growth or S shaped growth? You can imagine that for a lot of metho that improve tuning you’d expect to see more s-shaped curved. Ex, hyperparemeter tuning will stop giving you benefits. If you have S curves that stack, it looks like continuous growth.

We are seeing that in limited domains already. People are working on hyperparameter tuning and learned optimizers and environment generation. POET. In supervised learning this looks like intelligent data augmentation. Present search systems aren’t good at finding new building blocks, but at arranging existing building blocks. This does seem like a continuation of … moving away from being hand engineered.

Now it’s happening at the data level.

Jeremy gives a talk about questions. What do you mean when you say you don’t know how to represent things.

Ivan wants to know what is meant.

Joscha: There is a need to represent continuous stuff, computable geometries. Stuff in space. This is the behavior of many counts in the limit. Our minds are taking things to the limit. Typically they are not constructed but are created via mutation and convergence. You have a continuous decision process on which a constructive operator is used. Whe have a bifurcation between operations that are more or less discrete, and those that are continuous. You can use taylor series for everything, doing functional approximations. We can do nonlinearities in them. We can chain together lookup tables. Logic based systems. To construct the elements that you would need for the basics. The question is how can we find a representation that is suitab Consciousness is a model of the contents of your attention.

A cellular automata that is producing suitable patterns. Controlled by a processing layer. Trying to produce the representations that you want. And a layer turning on and off whatever controllers you want. Bottoms out in the same substrate representations, cut can make anything work.

Discrete continuous answer - these are geometric operations. Not in the way that our mathematics is geometric, because it’s an efficiently computable geometry. This is a set of operators that you r brain discovers when you’re trying to track many xyzs in the limit. What are the patterns that you can recognize at scale. Ex., the surface of the sea. Too many parts to count how they behave in the limit. When you interact with them, you have to perform discrete operations. You can train your brain with a continuous operator to keep track of the ball, but better to have a pointer. On a limit - you look at the behavior of too many parts to count, in the limit. You want to see the way that the series moves to its limit. Two operators -

Joscha: Sound is a very good example. You no longer measure the vibrations at some point. Eventually the sound itself has a new quality. You switch from the discrete to

Jeremy: S curving?

Todor: His answer is a copout - all the individual pieces are s-curving, but you might have an infinite number of them. When you stack pieces, you may make it S cuve. One piece may be learned optimizers. Another piece may be data augmentation. Another is architecture search. Another is generated better coding algorithms.

Are we thinking of when we get to human level, or what the capability trajectory looks like far into the future. For getting to human level, you only need a small subset, and there’s a non-trivial likelihood that you don’t need any of these, but straightforward improvements over what you have right now. In the long run, your capabilities progress might be the set of an increasing number of s curves.

Creating a new type of Creating
Creating a new type of Environment
Creating a new type of Task
Creating new Architecture Modules
Creating new Optimizers
Creating the Loss Function
Adding a new Task of an existing type
Modifying the Loss Function
Modifying the Optimizer
Modifying the Ordering of Architecture Modules
Modifying Training Objects
Modifying Training Labels

What is missing?

Todor: This is too verbose. You have an agent which is your architecture, you have an Environment (POMDP, or violating the assumption), and you have the update rule. All of your things kind of fall under these three buckets.

Agent
Environment
Update Rule

Jeremy: At what level do we want to specify things?

Ivan: Go sideways. From the CPU perspective, a bunch of code gets executed. It has an AI algorithm that it can do inference with in its memory somewhere. It need not have anything we recognize as an optimizer or a loss function or training data. This could look very strange, and include none of these.

Paradigms:

Program Synthesis
1. Object that is recursively improved: code in some high level language.
Machine Learning
NTM
1. Object that is recursively improved: policy / neural turing machine program.

Ivan: All the hard problems of RSI seem to be problems of program synthesis.

Debate about whether weights reduce to code. You could write out the matrix in python and do it in code.

Ivan: It’s quite hard to write SGD. Joscha: The major fix that needs to be made to mathematics is that it’s forced to be stateless. You’d have to switch to a stateful description. Today, he spent an hour describing the embedding space of all functions. All the most useful function close in search space. The embedding space of all computable functions will be a fractal. What is the right compression?

Who is working on stateful mathematics? Computation people.. Steven wolfram wants to rewrite everything in Lisp, and created mathematica. You can’t edit it with your own changes. You would have had to create a tradition around this. But this was the reasoning for starting mathematica at 22. There are integrals that mathematica cannot take but that deep learning systems can take.

Ivan: Haskell has found the right operators to perform without state. You could represent ML problems in functional programming.

Joscha: I am not opposed to functional programming. They are still passing state. You need an underlying infrastructure that doesn’t know that last digit of pi. If you use a representation of mathematics that assumes that functions and values are identical. PI must be implemented as a function, not as a digit.

Joscha: The regulation in the US is completely captured.

Joscha: I’m not sure that we can make a recursively improving system safe at all. But humanity is also not safe. So these problems are linked. If you build a system that understands its place and part in the world, that humans are a disturbance. Because I cannot prove that the system is not correct.

Todor: In terms of speed of takeoff, we’d expect slow takeoff in the sense that you’ll see a doubling over the course of 4 years before you see a doubling over the course of two years. You’re not going to have discontinuities in terms of how quickly capabilities advance. Don’t think at any point in time where scientists and engineers won’t be in the validation part of the loop. There’s strong incentives to The usual arguments against this sort of thing would be xyz dynamics… the cold war isn’t perfect competition.

Todor’s point is that you’re always going to have a human

Ivan: Cheaper labeling of training data won’t lead to a dangerous outcome.

Todor: More your worry is about whether your system has collapsed into la-la land, and needs to be fixed.

Todor: My model of AGI is more like comprehensive AI services, rather than a unified agent. You’re going to have integration risk, where you’re putting together complex systems doing unusual stuff. It won’t be clear whether things will explode. You’re still likely to have a human in the loop. The nature of the systems is such that you’ll be able to say that a give system, independent of its interactions with other systems, has a certain level of safety.

Todor: There will always be a human in the deployment loop.

Source Levels of Recursive Self-Improvement

Creating a new type of Creating
Creating a new type of Environment
Creating a new type of Task
Creating new Architecture Modules
Creating new Optimizers
Creating the Loss Function
Adding a new Task of an existing type
Modifying the Loss Function
Modifying the Optimizer
Modifying the Ordering of Architecture Modules
Modifying Training Objects
Modifying Training Labels

Meta-Learned / Meta-Optimization

AI-GA
Learning to Learn Gradient Descent by Gradient Descent
Neural Architecture Search: A Survey

(This will likely subsume Task Search & Continual Self-Improvement)

AI-GA makes a distinction between two approaches to building general intelligence. First, a manual approach that focuses on building many pieces of an intelligence (ex., recurrent gated cells, convolution, attention mechanisms, normalization schemes, etc.) and then put those building blocks together into a working general problem solver. Alternatively, an approach where the an AI-generating algorithm itself learns how to produce a general AI.

AI-GA

AI-GA Speed of Takeoff This approach is more likely than incremental methods to lead to fast takeoffs. Because both the architecture search and learning algorithm are learned, any interactions between these two could lead to incredibly fast learning upon (for example) the correct prior for learning being discovered by the algorithm learner which allows the architecture search and task discovery to work more effectively.

Much of the system is geared to making progress autonomously, without a researcher in the feedback loop. Each part of the system will search for architectures / learning algorithms / tasks. This setup lends itself to fast takeoff because there is no engineer or scientist bottlenecking its progress.

Clune: “In my view, the largest ethical concern unique to the AI-GA path is that it is, by definition, attempting to create a runaway process that leads to the creation of intelligence superior to our own. Many AI researchers have stated that they do not believe that AI will suddenly appear, but instead that progress will be predictable and slow. However, it is possible in the AI-GA approach that at some point a set of key building blocks will be put together and paired with sufficient computation. It could be the case that the same amount of computation had previously been insufficient to do much of interest, yet suddenly the combination of such building blocks finally unleashes an open-ended process. I consider it unlikely to happen any time soon, and I also think there will be signs of much progress before such a moment. That said, I also think it is possible that a large step-change occurs such that prior to it we did not think that an AI-GA was in sight. Thus, the stories of science fiction of a scientist starting an experiment, going to sleep, and awakening to discover they have created sentient life are far more conceivable in the AI-GA research paradigm than in the manual path. As mentioned above, no amount of compute on training a computer to recognize images, play Go, or generate text will suddenly become sentient. However, an AI-GA research project with the right ingredients might, and the first scientist to create an AI-GA may not know they have finally stumbled upon the key ingredients until afterwards. That makes AI-GA research more dangerous.”

Interpretability And so a human can set and forget this kind of learning process, and when it succeeds it may be without human oversight. The system certainly doesn’t have to be legible to the engineers and scientists building it to succeed. Because this introduces yet another layer of learning for each part of the system, it’s possible that the creator both can’t interpret what features are being learned from the data and can’t interpret how those features were learned, but only how the process for searching for the feature discovery was set up. This level of interpretability is unlikely to tell an engineer or researcher whether the updates proposed by the system to itself are safe, whether that means exploring safely, harmful side effects, or a hacked reward function.

Clune: “..., it is likely safer to create AI when one knows how to make it piece by piece. To paraphrase Feynman again, one better understands something when one can build it. Via the manual approach, we would likely understand relatively more about what the system is learning in each module and why. The AI-GA system is more likely to produce a very large black box that will be difficult to understand. That said, even current neural networks, which are tiny and simple compared to those that will likely be required for AGI, are inscrutable black boxes that are very difficult to understand the inner workings of. Once these networks are larger and have more complex, interacting pieces, the result might be sufficiently inscrutable that it does not end up mattering whether the inscrutability is even higher with AI-GAs. While ultimately we likely will learn much about how these complex brains work, that might take many years. From the AI safety perspective, however, what is likely most critical is our ability to understand the AI we are creating right around the time that we are finally producing very powerful AI.”

Controllability

Because so many parts of the system are up to higher level learning algorithms rather than manual control by scientists or engineers, and because even the task itself (and so the objective) is up to the task search algorithm, this approach has an incredibly weak level of controllability. While it may become generally intelligent, it is not even clear that humans will be able to interface with the system in a way that allows them to accomplish their goals (say, via natural language).

A critical aspect to controllability is interpretability. Because we don’t know how the system was made, piece by piece, we won’t even be able to reason about the specific learning processes that generated the representations being used to act (assuming that the system maintains representation learning as its paradigm for intelligent behavior). We will only be able to reason about the process that we used to create the learning algorithms, which may let us make some high level judgements about the properties of the system (in the same way that evolutionary psychology lets us make some weak predictions about human decision making). These are unlikely to be at a level of granularity that lets us understand the internal workings of the system in a way that lets us inject our preferences into the parts of the system that are relevant to getting the outcome that we want.

The level of autonomy of this system make exercising control over it more difficult. Because during training the algorithm was given exceptional autonomy over the data it’s processing, the means that it uses to process that data, the forms of evaluation that are worth using, and potentially more, the system will be able to accomplish its tasks without human intervention or oversight. Unless it’s designed this way (and there are strong incentives to avoid human oversight of training processes), there’s no human in the loop to make the system learn how to interact well with the human or take advantage of the human’s knowledge.

Ease of Verification

AI-GA systems could be meta-optimized for a given specification, and so in part leveraging learning algorithms to be very good at staying within a constraint may be useful. Learning new techniques for fulfilling tight specifications while maintaining strong system performance may be important to creating functional systems that are also safe. And so when it comes to training, there’s more flexibility in the system which can lead to discovering safe solutions.

Writing down a specification that will generalize to the environments and models that are generated by the AI-GA is a much more difficult task. One obvious problem is overfitting the specification, and fulfilling it technically while violating the purpose behind the specification. But other challenges like reward function hacking or learning to update the specification is a greater danger with an AI-GA system that has so much flexibility over its own actions.

Discovering constraints that allow for the training of powerful systems successfully may be a better task for an AI-GA than a human programmer, as long as a useful safety meta-objective can be described to the system.

Ease of Validation

External validation of the systems safety will likely be incredibly difficult in light of the systems opacity to its own creators. Trusting the training process used to generate the algorithm and running tests on the resulting system may be the only level at which the external validation can proceed. Likelihood of Reward Function Hacking

Incredibly high, given the flexibility of the system. Unclear how to avoid it, since awareness of its reward function and sensitivity to it is part of the plan for improving the system’s capability. Likelihood of Treacherous Turn Awareness of human programmers is unnecessary since the program is autonomously generating its own learning environment. May optimize against automated safety tests.

Interaction with Competition Knowing that your competitor is building systems with this level of automation may have you convert from an interpretable systems with this one. Actors may take whatever SOTA slow takeoff systems exists and continually try to turn it into an AI-GA, waiting to cross a threshold. Power of system at sub-general intelligence level Quite weak and useless, in comparison. Difficulty of value alignment. Incredibly hard. Not clear that there will even be an interface for humans. Human modeling isn’t a central part of the framework. Language understanding is also not a core part of the framework. Robustness to Distributional Shift, Small Alterations, Hacking, Hardware Faults, Software Bugs, Changes in Scale, Adversaries

Should be comparatively adaptive, and so able to overcome capabilities issues. Because it’s programming itself, bugs will likely be of a different type than human bugs. Small alterations may or may not lead to predictable changes in the predictions of the AI-GA itself. Probability of creating a general intelligence Quite high. Task generation / general optimizer search / general model search is a promising fastest path.

Discussion of architecture learning, learned optimizers, and generated training data in concert with one another.

This approach is more likely than incremental methods to lead to fast takeoffs. Because both the architecture search and learning algorithm are learned, any interactions between these two could lead to incredibly fast learning upon (for example) the correct prior for learning being discovered by the algorithm learner which allows the architecture search and task discovery to work more effectively.

Source: Original Google Doc