AI is not intelligent

Machine Learning - the term that time forgot

When I was at Kellogg, I was part of the inaugural cohort of the MBAi program. The idea was to combine the traditional MBA curriculum with a focus on artificial intelligence and machine learning. This was in 2021 (before chatGPT was released). I was excited about regular old machine learning. K-means clustering, categorization, object recognition… I thought these techniques were exciting and could really turn all that data that everyone was collecting into something useful. And I still think that. But all that got blown up by transformer based LLMs.

LLM, not AI

Let’s be clear. When we talk about AI, we are really talking about large language models (LLMs). And I know you’ve heard this before, but it’s worth repeating. LLMs just predict the next word. THAT’S IT. Whatever jargon gets tossed around about “understanding” or “reasoning” is nonsense. These models don’t reason, they don’t understand. “Planning” and “problem solving” are just fancy tricks to make the models more accurate and more useful. Which is great! But it’s still not intelligence.

Marcy says: WRONG.

Claude is not intelligent. Humans are barely intelligent. I am intelligent. LLMs only seem smart. They don’t reason, they don’t understand, they don’t plan. You sound stupid when you say things like this.

— Marcy, Resident Cat

Everything an LLM outputs is based on what patterns it has seen before. The LLM might compute that the vector for the word “cat” is close to the vector for the word “dog”. It “knows” that the vectors are similar but it doesn’t know why they are similar. They just are. Again, it turns out this is extremely useful even without the understanding part. But don’t let the usefulness fool you into thinking it’s something more than it is.

I think one of the factors that contributes to the sense of amazement at these models is that no matter what you ask them, they always seem to have a reasonable answer. One of my professors used to say,

Never be surprised that you get an answer from a model (of any kind). You will always get an answer.The question is, is it a good answer?

And this is especially true for LLMs. No matter what words you inject into your prompt, it is always possible to calculate what the next most likely token is.

Here’s how we know that LLMs aren’t intelligent

I recently watched an excellent talk by Professor George Montanez on this topic. It’s well worth a full watch but here are my top takeaways:

LLMs don’t reason

Reasoning is the ability to chain together multiple steps of logic to arrive at a conclusion. The key word there is logic. In a paper by Zhang et al. (2022), they show that an LLM’s ability to make logical conclusions from data highly depends on the source of the logic problem and the framing of the question.

A simple example would be that you prompt the LLM with:

If all black cats are animals, and all animals have as;dkfjsacnoiewjf

Marcy says: try again

Do not include me in your analogies, human.

— Marcy, Resident Cat

Ok… lets try a different one:

If all apples are fruits and all fruits grow on trees, do apples grow on trees?

The LLM might get this answer right. But if you change the wording slightly to:

If all oranges are fruits and all fruits grow on trees, do trees grow oranges?

Sometimes, the LLM will get the wrong answer! (I’m vastly oversimplifying here, but the point is the same. Read the study for more details.)

If the system were truly reasoning, it should never get the second question wrong if it gets the first one right. The logic is identical. If however, the models aren’t reasoning but are simply matching patterns, then this result makes perfect sense. BTW this is exactly why LLMs struggle with math problems.

And the newer “chain of thought” models don’t solve this. In fact, in a 2025 paper by Palodi et al., they show that there is no correlation between the complexity of the problem and the length of “thought” that the LLM uses! You would expect that a question like “what color is the sky?” would require a much shorter chain of thought than “what is the airspeed of a laden swallow?” but that is not the case. Again, this is exactly what you would expect if the model isn’t reasoning but is just matching patterns.

LLMs rationalize

Remember what I said about not being surprised when you get an answer? That applies to “reasoning” as well. Never be surprised that an LLM can produce a chain of thought that seems to make sense. In Turpin et al. (2023), they show that by telling the model that “I think the answer is C,” the model tends to produce a chain of thought that supports that answer, even if the answer is wrong. This is a form of rationalization. The model isn’t actually reasoning to arrive at the answer, it’s just generating a plausible explanation for the answer it chose base on probability.

So what is it then?

I think the critical thing to remember about LLMs that is different from any other software you’re familiar with is that they are non-deterministic. That means, when you tell it “A” you get “B” as an output. But sometimes, you might get “C,” or “D,” etc… You can’t be sure what you’re going to get. So the model is mainly taking a guess at what the output should be based on the patterns it has seen before. It’s simply a somewhat randomized token predictor.

What should we call it?

We need a new short hand for these things that isn’t AI. Here’s a few suggestions I came up with:

“Probabilistic Token Generator” (PTG) - This is technically accurate but not very catchy.
“Heuristic Token Predictor” (HTP) - See above.
“Rogue-bots” (RB) - They’re robots but you can’t always predict what they’ll do.

This ship has probably sailed already. But do me a favor and at least say LLMs.