LessWrong House Style
First a moment on the LessWrong/Rationalist community. These guys seem like useless, ignorable, dweebs but unfortunately they represent the backbone of “AI is imminent” thinking (doomers and optimists) and they’re getting press coverage. They have long had the ear of Silicon Valley wealth. Given that they are imbeciles given, occasionally, to dangerous ideas, I think it’s worth taking a moment now and then to beat them up. This is another such moment.
In my previous remarks I pointed to a thing about simulating evil AGI thingamabobs inside an LLM by, essentially, having the LLM write a story about a super-villain who knows he’s in a story, is smart enough to escape, and does escape. The escaped villain then kills everyone, I guess.
Now, the thing I pointed to, by an apparently respected member of the Rationalist Community, wrote 13,000 words or so of more or less gibberish which sketched out more or less the thumbnail above together with some other vague handwaving, and threatened a whole “sequence” of essays which thus far have yet to appear. A “sequence” is a LessWrong thing, I guess, where not being content with one ridiculously bloated forum posting that’s wrong, you write a series of interconnected bloated forum postings which are severally and jointly wrong. Thank goodness they stopped.
Anyways, the whole damned thing is very much in what I call LessWrong House Style, which is a way of writing, and of thinking about the world, and of thinking, which is extremely opaque and which does not lead to good thinking. Rather, the opposite. I think it dramatically impairs clear thought, and produces people who while they may have a lot of horsepower are functionally very very stupid. We’ll take a little time to examine the original post and see what makes it weird and bad as a set of reasoning. To do that, we should back up a bit.
People who think clearly will frequently do so by making a lot of analogies. We’re thinking about a thing, and it’s complicated. It’s too complicated. So we think of something else instead, something which we hope captures the right features of the complicated thing, but which is itself simpler. There’s a bunch of ways to do this.
You can literally just make an analogy, “it’s kind of like a donut, really.”
You can make a toy example “ok, same thing, but what if it only had 3 widgets instead of 3 trillion widgets?”
You can make a category of things that all have the relevant features “forget that it’s green, and that it vibrates, the main thing is that it has widgets, let’s think about anything and everything at all that has widgets.”
The difference, usually, between good thinkers and bad ones is that good thinkers make really really useful analogies. The donut, weirdly enough, captures exactly the aspect of the thing that we care about. Mathematics has refined this all to a razor edge, most “terminology” is about developing categories that have exactly the salient properties. The invention of salient properties, or equivalently the definition of good terminology, is maybe the most productive of mathematical techniques. Once you get the notation right, the theorem writes itself. Etcetera. LessWrong thinkers don’t use analogies at all, which is its own very special way of bad thinking.
Let’s return to the essay about simulators and simulacra, specifically the section entitled “Solving for Physics.” There’s a lot of technical terms in here, but insofar as they mean anything, the meaning is pretty obvious from context. Drop modifiers freely; they are mostly there as decoration to make the text sound more sciency (“conditional structure” for instance seems to mean “structure”):
The strict version of the simulation objective is optimized by the actual “time evolution” rule that created the training samples. For most datasets, we don’t know what the “true” generative rule is, except in synthetic datasets, where we specify the rule.
The next post will be all about the physics analogy, so here I’ll only tie what I said earlier to the simulation objective.
the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.
To know the conditional structure of the universe is to know its laws of physics, which describe what is expected to happen under what conditions. The laws of physics are always fixed, but produce different distributions of outcomes when applied to different conditions. Given a sampling of trajectories – examples of situations and the outcomes that actually followed – we can try to infer a common law that generated them all. In expectation, the laws of physics are always implicated by trajectories, which (by definition) fairly sample the conditional distribution given by physics. Whatever humans know of the laws of physics governing the evolution of our world has been inferred from sampled trajectories.
If we had access to an unlimited number of trajectories starting from every possible condition, we could converge to the true laws by simply counting the frequencies of outcomes for every initial state (an n-gram with a sufficiently large n). In some sense, physics contains the same information as an infinite number of trajectories, but it’s possible to represent physics in a more compressed form than a huge lookup table of frequencies if there are regularities in the trajectories.
Guessing the right theory of physics is equivalent to minimizing predictive loss. Any uncertainty that cannot be reduced by more observation or more thinking is irreducible stochasticity in the laws of physics themselves – or, equivalently, noise from the influence of hidden variables that are fundamentally unknowable.
If you’ve guessed the laws of physics, you now have the ability to compute probabilistic simulations of situations that evolve according to those laws, starting from any conditions. This applies even if you’ve guessed the wrong laws; your simulation will just systematically diverge from reality.
Models trained with the strict simulation objective are directly incentivized to reverse-engineer the (semantic) physics of the training distribution, and consequently, to propagate simulations whose dynamical evolution is indistinguishable from that of training samples. I propose this as a description of the archetype targeted by self-supervised predictive learning, again in contrast to RL’s archetype of an agent optimized to maximize free parameters (such as action-trajectories) relative to a reward function.
An important observation about LessWrong House Style is that they never use analogies. They always carry The Thing around with them, and talk about it directly. I don’t know why they’re allergic to analogy, but I suspect there’s some deep-seated notion that analogies distract and distort. In this, they are not wrong, a bad analogy is quite a bit worse than nothing. So this person jumps right to “LLM-like things trying to do their best will probably definitely maybe come up with physics.”
The author says at some point that the essential thesis is italicized bit:
“the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum”
which is either a tautology, or wrong. I’m honestly not sure what the author means by “the conditional structure of the universe implicated by their sum” but it is either, by definition, “the upper bound of what can be learned from a dataset” in which case the thesis is a tautology, or it’s something else, in which case the thesis is wrong.
Either you can perform magical feats of inference from a dataset (wrong) or you can’t, in which case it’s just a definition.
Unlike the author above, we’re allowed to use toy models, so let’s do that. Let’s think about tic-tac-toe gameplay strings. Let’s take for now the idea that the thesis above is in fact a definition. Let’s try to get our arms around “what can be learned from a dataset” as a way of getting a sense of the “upper bound” of same, and thus some notion of what the “conditional structure of the universe implicated by their sum” might look like.
If we start with two or three tic-tac-toe gameplay strings, the implicated universe is pretty trivial and doesn’t look a lot like tic-tac-toe. As we’ve seen in previous remarks, if you pump a large set of gameplay strings into a model, it will in fact begin to model a tic-tac-toe board (or an isomorphic structure) and start to reliably produce valid game play. The “implicated universe” is “tic-tac-toe.” It seems extremely unlikely that it will deduce human biology from these strings, or indeed anything at all outside tic-tac-toe (if it did deduce humans, that would be extremely awkward if the gameplay strings all came out of a computer.)
The implicated universe has limits. The input language may only go so far.
Consider a tic-tac-toe variant, where two boards are stacked, and each move consists of placing your token on each of the two boards. Two parallel games of tic-tac-toe: to win the modified game you must win both boards simultaneously. A “move” calls out two positions, one on the top board, one on the lower board. By convention if you call out but a single position, you mean to place your token on that same position on both boards.
Suppose that the two players have misunderstood slightly, and not noticed that there are two boards. They play by calling out single positions. They are in essence playing a pair of identical games of tic-tac-toe. The gameplay they generate is identical to playing standard tic-tac-toe.
Training a language model on this gameplay will produce a model of standard tic-tac-toe, not of the modified game.
The implicated universe may not represent the real universe, if the input dataset is limited in specific ways.
The make this more meaningful, suppose that there is a property of electromagnetic radiation, which mammals on earth have not evolved to detect. We detect wavelength, intensity, but not “fleek.” It happens that one can develop a complete, coherent, consistent model of physics without reference to fleek, by simply introducing a complex menagerie of particles, and so humans have done that. An outside observer might be astonished at our effort, commenting that the whole thing is much simpler if you only included fleek. Fleek might be important to something we haven’t invented yet. Maybe you could make memristors work. We can’t just say “meh, fleek doesn’t matter, particle physics is isomorphic to reality so whatever.”
While it is possible that fleek is deducible from certain experimental results, there’s no a priori reason to suppose it is (since I just made it up!) Training a model in a way that somehow deduces physics from the human textual corpus might well fail to notice fleek and instead reconstruct human particle physics: an implicated universe that is not the real universe.
Let’s look at tic-tac-toe again.
Suppose there is a meta-game that involves playing thousands of games of tic-tac-toe and doing something distinctive on the game board at the conclusion of a round of the meta game. If we trained a model on 10,000 games of tic-tac-toe, the meta-game would likely not be discernible. Sure, we could just train on more games, on more enormous sequences of games, and eventually the meta-game would emerge. Let us in fact assume that.
The point here is that if we do not know about the meta-game, we don’t know how large the training set has to be.
There is no way to know, offhand, whether any particular dataset is large enough to imply the real universe.
One more example. Let’s move away from tic-tac-toe.
In our example we have a system that generates random computer programs which are: runnable, which accept a set of integers, and which upon exit return a single integer result.
Suppose further that our randomly generated programs are sufficiently rich for the next part to work.
We run this thing on a vast cloud of computers each of which generates a random set of inputs, a random program to consume those inputs, runs that program, and upon its exit prints out the input data, the source code of the program, and the resulting integer.
The clever fellow will note that not all of these things will terminate within any specific timeframe, or indeed, ever (assuming the generated programs are rich enough, some will never terminate.)
This is OK, we’re going to train our model on what the cloud produces as it produces it.
Note that I have constructed a system which attempts to train a learning model to solve the halting problem. Since it provably cannot, it definitely will not.
The implicated universe need not be computable at all, and in such cases, it is provably impossible for the training set to imply it.
Now, what is interesting about the LessWrong community is that they would certainly quibble with all of the above. There is a stripe of thought, explicated in this post (which is short and pretty easy to read) that argues, essentially, that:
you can train a language model to predict the next token correctly, no matter what the difficulty, if only you try hard enough.
therefore any problem can be overcome by a language model that has been trained hard enough.
I am certain that some of the LW community would claim my last example simply proves that GPT-N for sufficiently large N can solve the halting problem. Note also that “janus” (who is the author of the original post I am critiquing here) in a comment seems to think this insane argument is equivalent their thesis, which suggests that the “implicated universe” business is in fact intended as magical, rather than tautological.
Janus and others seem to literally be arguing that with enough training data a learning model can, and in fact will, solve literally any problem.
This is obviously insane, the correct conclusion is that learning models cannot in fact be trained so hard that they will always get the next token correct. This is provable, and it’s not even hard to prove. It’s intuitively obvious, and a burly argument that backs the intuition is easy to build.
You do, however, have to approach it through analogies, through toy models. When you insist on thinking about the whole thing at once, you wind up essentially just saying things that feel right, things that are appealing. You can’t actually reason about the damned thing at all.
This is the essential pathology of LessWrong. They have a bunch of methods and techniques which produce immensely long essays that feel like arguments, but are not. They seem to find the idea that learning models can do magic to be appealing, and so they are “arguing” in ways that support this notion.
Unfortunately, they’re using a system of “reasoning” which is no such thing, and which has made them functionally very stupid, so their arguments are insane and wrong.