WHY we know that modern AI does NOT think in any sense of the word "think"
The current discussion of AI -- like practically all topics in Clown World -- is largely conducted at the knuckle-dragging level of "I'M FUR IT!" or "I'M AGINST IT!" In this note, I want to clarify what we mean when we say that state-of-the-art AI does not think. Not at all. Not even one iota of thought is going on inside of modern (pre-training-based) AI. Clarifying this point is important because this isn't some generic "I'M AGINST IT" manifesto, it's a contentful critique that is meant to explain a total blind-spot in modern thinking about AI.
Let's start with the concept of logical depth. I'll link the below video for reference:
In the field of Algorithmic Information Theory, the "complexity" of an object is a measure of how random it is. The most complex objects in AIT are pure random strings. Defining exactly what counts as a pure-random string is the main topic of AIT. In order to define this complexity, we use a measure called the Kolmogorov complexity of a string, which I will denote K(x) -- the K-complexity of string x. ("String" here means a string of binary bits, as in a computer's memory.)
As Zenil notes, K(x) does not correspond to the ordinary plain-language use of the word "complex" We would say that a car's engine is complex, but we would not refer to the random scrabble of pebbles in a riverbank as "complex" -- they are not complex, they are random. But in AIT, the engine is simple because there is some highly-compressible bit-string that could contain the engine diagram, whereas the pebbles on the riverbank are complex because a JPEG photo of them could not be compressed very much without blurring the picture where the pebbles are no longer individually differentiable. So, how do we talk about the ordinary kind of complexity in AIT? In AIT, we call this property logical depth. "The logical depth of string x is D(x,s) = min{ T(p) : (|p|-K(x) < s and (U(p)=x and halts) }. Translated into plain language: "The logical depth of string x for significance level s is the program p that outputs x in the fewest steps (min(T(p))), and halts, and whose length is no more than s bits larger than K(x)." This might seem odd at first, but the point is that we're combining both the length of the program p that outputs x and the number of steps the program requires to run: T(p). We're using s as a sliding-parameter to see how T(p) behaves as we "constrain" the length of p (denoted |p|) towards K(x). That is, T(p) will be maximal when s=0 because we're saying, "however long it takes to compute x from p, is however long it takes). But as we allow longer programs by relaxing p (allowing larger values of s, that is, longer programs), those larger programs can save computing steps (in many possible ways), reducing T(p). Thus, a string with high logical depth is a string that, even as we relax |p| (by increasing s), it still requires a long time to compute x, that is, T(p) remains high. The example that is often used is that of a textbook. While a textbook might be compressed down to a small number of bits (small K(x)), even as we allow longer programs, there is still going to be a fairly large T(p) because the smallest compression takes a lot of computational steps to elaborate. The textbook's text is algorithmically "rich", it is a very non-trivial computational object, meaning, it's not just complex, it actually has a lot of structure within it. By contrast, a pure-random object does not benefit from additional computation steps, because there is no "underlying structure" to exposit... it's simply a pure-random object.
Thus, objects like a car's engine or a textbook have high logical-depth, because there is "a lot of explanation" entailed in understanding the structure of the engine. As those explanatory principles are compressed, they are very short (the scientific principles of an engine are simple) but working out all their implications in the final engine itself is non-trivial (requires many steps of computation/explanation). By contrast, both trivial objects (like a repeating crystal structure) and random objects (like pebbles on a riverbank) have low logical depth because there is not a lot of "explanation" involved in either one. Both are "shallow" objects. The crystal has low algorithmic complexity (its structure is algorithmically simple) and its "explanation" is just pure repetition of the same pattern everywhere. The pebble bank is "highly complex/random" but its structure is just an exact quotation, that is, a JPEG photo of the pebbles cannot be compressed very much, it's just a pixel-by-pixel copy of whatever arrangement the pebbles happened to be in. Both crystals and pebbles on a riverbank are "logically shallow" objects -- one is "shallow and regular" the other is "shallow and irregular".
Even if all the technical details above don't make sense, I hope to convey at least the gist of what logical-depth is. (It's not actually a difficult concept to understand, it just takes some patience to get used to the jargon of AIT). The point is that pre-training-based AI is, by its very nature, logically shallow. This is what I call "reflexive thinking" and is often called "fast thinking" versus "slow thinking". Slow thinking is the kind of thinking you use when reasoning about objects or systems with high logical-depth. A car engine is a highly non-trivial object. Reasoning about what might be going wrong in an engine, like a mechanic does, is a highly non-trivial thinking exercise. The mechanic needs to assess the symptoms, identify possible causes, and then begin ruling them out by testing the engine's behavior in various ways. In order to arrange the sequence of tests he is going to perform (in his own mind), he has to have some mental "picture" of the internals of the engine... he needs to know how to "simulate" the engine in his mind, as it were, and imagine how things could be going wrong internally, based on the observed symptoms. Once he has ruled out other causes, he will be able isolate the actual cause or root cause of the problem. Once diagnosed, he can then fix the problem. This is the literal structure of thinking, this is what it means to ponder or "think through" something. Pre-training-based AI cannot think, on any construction of the word "think". It's just pure gut-hunch-level reasoning, nothing more. The kind of thinking that somebody who has been around cars but never worked on them might have. "I heard that noise once, try this, maybe it will work." That's shallow/reflexive thinking, not deep thinking (thinking-as-such).
This post was inspired by the following video, which treats the broader subject in detail:
Current AI systems do not employ any cognitive tools or, at best, they only employ them (kind of) on a very superficial/shallow level. They are very sophomoric / savantish. This comes from the underlying architecture because they are based on a shortcut/hack (prediction-based training + some limited RL to refine desired properties on the outputs). There is no shortcut to thinking -- anyone or anything that is actually thinking will also be able to walk you through their chain of reasoning, step-by-step, without the frenetic "scrambled thinking" typical of CoT reasoning traces. Thinking might involve some scrambling under time-pressure, but scrambling is the enemy of calm, settled contemplation. It is the foe of slow-thinking, it is the exact opposite of the essence of what thinking is. Thinking is what you do while preparing for a test (in the weeks ahead), it is not what you do while taking the test. If you only have 90 minutes on the clock, you had better have prepared (memorized) the thinking traces you were going to need, ahead-of-time. It is that preparation that is the actual thinking. Most of what you're doing during the test is recall and application of thinking-techniques. Thinking-techniques are cognitive tools (see video), and we develop these cognitive tools in our own mind by thinking itself. Thinking-tools are tools of pure thought... we create them by thinking, and we use them FOR thinking. Thinking tools are like algorithms... they help us reduce otherwise complex/scattered things into concise/coherent things. The product of thought which uses thinking-tools will have high logical depth because you are using an elaborate set of steps (in your mind) to go from the tools, to the final product. Thinking tools are like the hammer and saw, and the result of elaborate thinking is the framed house. Pre-training-based AI tries to arrange all the boards in a catapult and catapult them all-at-once into position as a framed house in a single swoop. For simple structures (like sand-castles) this can actually work. But as the structures you are trying to build become increasingly elaborate (high logical depth), this becomes not just hard but impossible.
This is what we mean when we say that "AI does not think" and "pre-training-based AI cannot scale into AGI". It's not some generic "I'M AGINST IT!" mentality, it's a concrete, contentful criticism about what modern AI systems lack, and what they will never be able to develop through sheer scaling...