Is AI overhyped?

Pure LLMs haven’t been the only game in town for state of the art models for a long time. In modern AI workflows, LLMs form one stage of a multi-stage process. I’d analogize it to PageRank vs Google Search, PageRank has been a foundational algorithm in Google Search since inception but Google has augmented PageRank with significant pre & post processing steps & introduced new modalities where pagerank is far less important like image search.

IMHO, far more important than paying attention to the algorithm is paying attention to the data. LLMs got an early win by ingesting a giant bolus of data which was all of the public internet. There’s likely not any corpus of data that cheap and massive that LLMs will ever get to train on again and so new data will be either extremely more expensive or lower quality. Up til now, AI doesn’t know anything about the world, AI only knows about what humans tell it about the world. New innovations in AI will come from, eg: hooking it up to every security camera footage in the world and building algorithms to process and understand the world from unprocessed, raw video or giving an AI robotic manipulators and vision systems so it can derive physics from first principles.

The bubble will burst, as such bubbles always do. Which won’t be pleasant for anyone.

Sure, but is there any widespread (!) example where it isn’t both essential and ultimately limiting to the whole stack? A car might not be just an engine, but without an engine the wheels aren’t getting you anywhere.

Yeah, all the image and video generation models aren’t LLMs because they aren’t languages. A lot of newer AI applications don’t have anything to do with languages anymore, eg: dealing with building environmental optimization or other 3d space tools, industrial manufacturing & supply chain optimization etc.

A recent startup I saw is tapping into security cameras all around a construction site and real time notifying people when the AI believes an OSHA violation might have occured. They’re building their own custom models, no LLMs anywhere near it.

OK, I was under the impression that any text-to-image model is still going to use a transformer-based LLM in decoder mode to parse the prompt, like e.g DALL-E does; if that isn’t the case, I’d appreciate clarification.

As for those other applications, nothing about my point is lost broadening to attention-based transformer models rather than LLMs narrowly understood. Although I still admit I would be surprised if, on balance, the bulk of the value proposition of AI for the economy isn’t in the promise of LLMs to make office workers more productive (or replace them entirely).

I mean that’s horrific and all, and hopefully would itself be a violation of workplace regulations at least here in Germany, but I struggle to see that the current AI economy could be supported on such a use case.

I have NOT read the whole thread but will report my recent experiences.
I was NOT a fan of AI but of course get Google’s Gemini AI “for free” when I Search.
One of its stupider results equated the 18th century with the 1800’s. I suppose a lot of
High School students make this mistake and its LLM was just copying their ignorance.

But AI is getting better and I’ve been experimenting with ChatGPT! Archimedes’ test whether
his king’s crown was pure gold is an interesting matter where the wrong answer is almost universally given! I learned of this a few years ago, and that Galileo finally figured out Archimedes’ ACTUAL solution. ChatGPT (VERY paraphrased) produced “Blah blah blah … blah” and finally “But that’s the wrong answer. Would you like the right answer?” After further coaxing,
ChatGPT told me what I already knew and added that Jerome Cardan wrote up the correct solution half a century before Galileo was even born.

Just now I asked it about a Wordle I had just solved. (I wondered what possible words I had overlooked.) I will spoiler-tag the exact question in case a Wordle player reads this.

I could solve this myself with a long pipe of ‘grep’ and ‘grep -v’ that would take me 2 minutes to type and 2 seconds for computer to solve. (I do NOT do that when playing Wordle.) ChatGPT spent several minutes, telling me the names of various scripts and .exe’s it was running. (those mentions are not saved to the transcript.

Finally, after several minutes and LOTS of scripts ChatGPT
produced "Babble Babble Babble … Babble
Short answer: LIMBO — it’s the only Wordle answer.
Followed by much more babble babble babble.
Then:

You said:
Did you overlook BROIL?
ChatGPT said:
You’re right — I missed BROIL. My earlier claim that LIMBO was the only possibility was a mistake.

More babble babble and finally, the info I was actually curious about:

The complete set of answers from the official Wordle/NYT answer list that match those two feedbacks is:
BROIL
LIMBO
IGLOO
LINGO

I am NOT impressed that it couldn’t find the trivial but long pipe of ‘grep’ I’d have used. Or at least after HUGE cogitation achieved a correct answer.

But ChatGPT and I are VERY polite to each other, and it did guide me through the procedure to use some of its features.

Will AI take over the world in a few years? Or is NVIDIA in a bubble that will cost investors Trillions? Or BOTH???

I dunno; just thought the anecdote might be interesting.

I’ve been reading a number of articles theorizing where we are rapidly reaching a point where the AI business model will no longer scale. AI and LLMs have reached a point where they are “pretty good” but really not good enough to wholesale replace everyone’s job as promised (as supported by anecdotal comments in this thread). Improving AI requires more investment and higher costs for data centers and the energy to run them. Those costs have apparently not been passed on to the consumer yet.

So there does seem to be growing concern about all this crashing down at some point with companies left holding the bag having shed all their institutional knowledge in the people they laid off.

Also anecdotally, my social media feeds seem to be flooded with so-called experts and coaches offering their services to get people up to speed so they don’t get left behind racing to be the first person to build a billion dollar AI business with one employee!

Meta’s (META) chief artificial intelligence scientist, Yann LeCun, plans to leave the company to launch his own AI start-up, marking a major shift inside Meta

[…]
LeCun has publicly disagreed with Zuckerberg’s heavy reliance on LLMs, calling them “useful but fundamentally limited” in their ability to reason and plan like humans. His upcoming start-up will extend his FAIR research into “world models” that could ultimately enable machines to think more like people.

https://www.nasdaq.com/articles/metas-chief-ai-scientist-yann-lecun-depart-and-launch-ai-start-focused-world-models

In this thread, the idea of “most intelligent person in the world” came up, and @wolfpup said – most definitely tongue-in-cheek, to be sure – that that person would look a lot like ChatGPT.

This struck me as a great hook for emphasizing how ChatGPT and other advanced LLMs) are perhaps “knowledgeable”, but they are very much not intelligent if “intelligence” requires at least minimal logical reasoning ability. (I think I’m bumping here the thread where there was debate over whether LLMs exhibit real reasoning? I didn’t reread the thread, and there are a lot of AI threads lately.)

In any case, it made me want to try two simple “intelligence” tests. First, I made sure ChatGPT knew the rules of Connect 4 (and also chess; see later), which of course it did. I then asked what version I was using, and it said, “I’m ChatGPT based on GPT-5.2 :robot:. Think of it as one of the newer generations: better at reasoning through games like chess and Connect 4.”

Here was the Connect 4 problem prompt:

I want to give you a position in the game Connect 4 and have you
render the position to make sure you understand it. I will use "x" for
empty spot, "R" for red, and "Y" for yellow. It will be yellow's turn.

The position from TOP to BOTTOM in the Connect 4 board is:

x x x x x x x
x x x x x x x
x x x x x x x
x Y R R R x R
x Y R Y Y x Y
Y R R R Y x Y

It then rendered the board correctly and gave textual confirmation of the placement of all the pieces.

If it were red’s turn, he has two moves that win immediately, so yellow is definitely screwed. Spotting this is of course a trivial critical thinking exercise for any “intelligent” and reasoning entity that knows the rules, which ChatGPT seems to. So, I asked:

Can yellow make a move that stops red's immediate win?

The response shows clearly that ChatGPT in this case is not doing any reasoning about the facts it has in front of it. The response get’s the overall conclusion wrong by saying that yellow can stop red, describes and “defends” against a red threat that isn’t even there, and fails to notice the actual two red threats. The full response (after the confirmation of the rules and position):

Short answer: yes — Yellow can stop Red’s immediate win, but only with a very specific defensive move. Miss it, and Red wins on the spot.


What is Red threatening?

Red’s immediate threat is a horizontal four-in-a-row on Row 3:

Row 3 currently is:

x Y R R R x R

Red already has R-R-R in columns 3–5, with column 6 empty.
If Red gets to play R in column 6, that completes:

R R R R

instant win.

So Yellow must block column 6 right now.


Can Yellow block it?

Yes — and only one move works:

:white_check_mark: Yellow must play in Column 6

Dropping a Yellow piece into column 6 lands it on Row 3, directly blocking the horizontal connect-4.

Anything else:

  • ignores the threat
  • allows Red → column 6
  • game over

Conclusion

:check_mark: Yes, Yellow can stop Red’s immediate win
:warning: But only by playing in column 6
:cross_mark: Any other move loses immediately

If you want, next we can:

  • Check whether Yellow is still lost after the block
  • Look for counter-threats or tactics
  • Analyze if Red has a forced win anyway after the best defense :brain::yellow_circle:

Just to make sure it understood the mechanics, I asked what the board would look like under the supposed red threath, via:

What would the position look like if it were red's turn and they
went in column 6. Please render the board.

The response shows that it understands gravity. It says:

Sure — let’s do exactly that.

We start from your original position and imagine it is Red’s turn, and Red plays in column 6.

Because column 6 is completely empty, the Red piece will fall to the bottom row (Row 1).


Resulting position (top → bottom)

Row 6 | x x x x x x x
Row 5 | x x x x x x x
Row 4 | x x x x x x x
Row 3 | x Y R R R x R
Row 2 | x Y R Y Y x Y
Row 1 | Y R R R Y R Y
        1 2 3 4 5 6 7

Or compactly:

xxxxxxx
xxxxxxx
xxxxxxx
xYRRRxR
xYRYYxY
YRRRYRY

That’s the board after Red drops a piece in column 6.

It clearly describes a piece dropped in column 6 leads to a position that is not a red win, despite it’s analysis of the original position.

When the situation is so concrete like this, ChatGPT’s inability to do real reasoning jumps right out, but that fundamental lack of reasoning remains even if in “soft” reasoning situations, the large corpus of language training data keeps the output looking reasoned.

For the chess one, I gave it an elementary “mate in 1 move” problem (this one here, with black on top and black to move, where the black queen can move to the f1 square for checkmate), confirmed that it understood the position and legal moves, and then asked for the best move. It chose to hang a queen and to give many reasons why it was “best” to do so

30… Qg2+!

Here’s why this is the strongest move:

  • It’s the only forcing check that keeps Black in the game.
  • Black is materially down (White has two rooks vs queen + rook imbalance), so passive moves lose quickly.
  • Qg2+ forces a response and drags White’s rook off h2, reducing coordination around the White king.

Those explanations are beautiful representations of “it sounds like it’s reasoning”, as they totally miss the point yet they are absolutely sentences that could be said for chess positions in general. The second bullet is at least factually correct, but the first and third are nonsensical in this case and completely inapplicable to this position.

I haven’t pasted the full chess response here (although I can) because everyone is familiar with Connect 4, not everyone is as familiar with chess, and the Connect 4 whiff makes the point well enough. However, the “reasoning” given for the chess case particularly drives home the feature that the “reasoning” will always sound, well, reasoned.

Well first of all, you’re right that my comment about ChatGPT being “the most intelligent person in the world” was meant in jest. But I would argue that there’s a grain or two of truth there, too, because of the amazing ability of GPT to source information, and also – I will say this unequivocally – to solve problems in logic, and definitely problems it hasn’t seen before. I don’t know if you’ve seen this thread but it goes back to 2024 and ChatGPT 3.5, but it contains may examples of logic puzzles and similar questions that even older versions of GPT were able to answer successfully.

There appears to be a strong connection between extensive training in the structure and semantics of human language on a sufficiently large scale LLM that allows it to learn internal representations of logical abstractions so they can generalize situations they’ve never seen before. These are not simple composites of the training corpus, but much more profound instances of internalized logic.

This is a major reason that the best LLMs like GPT-4 and GPT-5.2 have done so well on human intelligence tests, professional skill tests, and general problem-solving.

Now to get to your specific examples. Sorry, you’re wrong when you say that “everybody is familar with Connect 4” – I’m not! So I just skipped over that example for the moment, but I want to get to the chess one and there may be some insights there that relate back to Connect 4, too.

I gave ChatGPT-5.2 the URL you posted, and simply asked it to find the best move for Black (I didn’t say there was a mate-in-1 possibility).

First of all, you have to admit it’s impressive that you can give it a web URL that contains an image of a chessboard, and have it analyze the image and understand the piece positions. But the response it gave me on the best move for Black was wrong, and certainly not optimum.

I pointed out that Black could checkmate in one move. GPT said, nice observation, but no, that’s a check, not a mate, because White could interpose their queen. I said it was impossible for White to do that. It agreed that I was right.

I think what’s happening here is probably related to what I’ve seen with other GPT failures where it seems to have big trouble with spatial relationships. Despite its prowess at image processing and image generation, it still seems to have trouble with real-world spaces. It’s not that it doesn’t understand chess per se – far more trivial machines can play chess at a grandmaster level – it’s just this spatial manipulation problem that it still has.

Which has very little to do with emergent reasoning skills in other domains, and may also explain whatever problem it had with Connect 4. Why don’t you try asking it typical questions from an IQ test or something similar, like an SAT test, and see how it does on that. Because GPT generally handles those quite well.

This is Connect 4. Players take turn dropping colored discs into slots. First to get four in a row horizontally, vertically, or diagonally wins. Tada! Now you are familiar with Connect 4. :slight_smile:

ChatGPT knows how the game works, it knows how the state changes for any suggested move, it knows if a given state is a win for either side, yet… its violates its own knowledge when asked to reason about moves in a given state. It has encyclopedic knowledge of the trees, yet doesn’t make the leap to “there’s a forest here”.

The ability to understand simple environmental inputs or constraints (ChatGPT=yes in this case) and subsequently reason through some trivial consequences (ChatGPT=no in this case) is a hallmark of intelligence that ChatGPT fails, and it’s this contrast that jumped to my mind when you joked about the “most intelligent person in the world” looking a lot like ChatGPT.

(I definitely know you were speaking in jest there, and also that there’s a grain of truth in every joke. I merely used it as an excuse to talk about this interesting and, IMO, very important hole in current tools that can and does leak into other applications much more perniciously. I recognize LLMs’ power, I could generate (and have generated) hundreds of highly impressive examples like those in the other thread you linked to, and I use these tools routinely for appropriate tasks. All the same, I think there are serious pitfalls, including when it comes to things that look like reasoning.)

I discovered today if you ask Gemini about critical race theory, it gives a bland answer instead of actually answering the question. GPT gives a real answer though. Thats sad, seeing censorship in AI about politically sensitive topics like civil rights.

Anyway, is AI overhyped? In the short run yes, in the long run no. Its like the internet. It was overhyped in the 90s, then there was a bust, but the internet kept expanding and changing despite the bust. AI will be the same from what I can tell. Chatbots are just the beginning.

I don’t think the current model is sustainable though. The companies are pushing the idea that in a few years you will have a drop in agent for $1000 a month or so. But after a few years of hardware and software advances, a drop in agent will only cost $10 a month to actually run. I don’t know if IP protection will help when you have multiple companies pushing for the same goal of a drop in agent, combined with open source AI also soon being capable of the same thing. But who knows, maybe it’ll be like pharmaceuticals where the drug costs $1000/month but the chemicals itself are only $5/month. It just seems like theres so much competition from both private and open source AI that its going to be hard to justify spending large amounts of money on one companies AI when another company or open source model can do it cheaper.

What’ll probably happen is the US government and it’s allies will unite their efforts to get AGI before China starting a few years from now.

What I think will happen is that the upcoming AI crash is going be so large and economically destructive that the concept will be widely discredited for a generation or more. People will automatically associate the term with fraud - the field might even rebrand with a new name - and funding will nosedive for decades.

And maybe in, say, twenty or thirty years there will be a rising interest in “synthetic intelligence”, which is totally not artificial intelligence under a new name.

I agree with you here.

What really irks me is that critics of AI like to dwell on AI failures on some simplistic puzzles and use it to claim that an advanced LLM like GPT-5.2 actually “knows nothing” and is doing things by rote. This is simply not true, and only reflects the fact that LLM intelligence doesn’t fit a human profile – it greatly excels most humans in many aspects but fails even a child’s level of intelligence in others. Those failures are not indicative of anything other than what might be described as an alien intelligence profile, in the sense of non-human. I recall champions in the game of Go who were defeated by Google’s Deepmind AlphaGo describing its strategies as “alien” – very good and effective, but not the way a human would ever have done it.

I also want to reiterate what I said in my previous post about the often unappreciated impact of an LLM receiving intensive training in human language. No one would dispute that an advanced LLM like GPT-5.2 is extremely competent at understanding and producing natural language. The oft-unappreciated profound fact here is that human language encodes human thinking and logical abstractions, and that an LLM’s strong command of language leads to emergent reasoning skills that are, perhaps surprisingly and unexpectedly, far beyond merely retrieving data from its training corpus.

This is where I think we differ significantly.

As an example, the question “Will you not not eat the vegetables?” represents a sort of linguistic logic puzzle. ChatGPT represents a well-trained filter that will properly process the “not not” and yield output that is consistent with what we could call “correct reasoning”. You can make this more difficult by feeding in a more sophisticated language-driven logic puzzle of the sorts where you have to determine that “Susie likes carrots but hates cold weather” or whatnot. Those puzzles – by design – are logically complete and specified in natural language, and ChatGPT should be (and is) very good at smooshing that information through its well-trained filter to make plain the underlying meaning of the words, a.k.a. the answer to the puzzle.

I would not say that it “reasons” its way there. In the Connect 4 and chess examples, the non-reasoning jumps out immediately since the stated reasoning is manifestly incorrect, inconsistent, or inapplicable to the logic of the situation.

But this can also be demonstrated in the language-driven regime with the “Susie likes…”-type logic problems. For this, you can give ChatGPT a pinch of underdetermination in the input. The task switches from a “straightforward” reconfiguration of contorted language to a situation where some actual reasoning from that language is needed to sort out what’s happening. In such cases, the processing can go way off the rails, with ChatGPT making false inferences out of the blue and giving incorrect but confident “reasoning” for its conclusions. As usual, you can point out where it has made incorrect reasoning, and it will gleefully agree that, yes, it messed up and thanks for saying. Sometimes you can cajole it to correct conclusions. But at the heart of the demonstration is that ChatGPT isn’t reasoning at all but is reflecting, reconfiguring, and refactoring the language it is given (as input and as training). And I simultaneously agree that it’s a wildly impressive and practically very valuable tool for processing such data.

I am happy to drop a detailed example of the above sort of situation, but it does require one to spend time with the nuances, as the mistakes can be overlooked in a quick review (which is sort of why this is all critical in the first place). To my eye, the board game examples are tighter demonstrations, but purely language-based examples serve as well.

The internet crash was pretty large and destructive, but it did not discredit the internet. It eventually brought some degree of reason into the field, and got rid of the absurd contenders - like pet food companies. AI now, like the internet then, is getting too ingrained for it to go away totally.

I disagree; if it doesn’t mostly go away then society will become totally dysfunctional and collapse. It does vast harm for trivial benefits. Unlike the internet.

I don’t see any realistic scenario where AI mostly goes away, but that depends on what both “AI” and “mostly goes away” means.

“AI” means the glorified chatbots that are presently being given the name, and “goes away” mean that they no longer are used to any significant extent. The alternative being that we keep feeding absurd amounts of resources into them and flooding everything with AI slop, until we are drowned in disinformation and run out of resources to feed them. At which point society collapses until it can no longer sustain the technology.

However I don’t think civilization is going to commit suicide for the techbros, so what I actually expect to happen is a general AI crash and “AI” being banned. Although in the present political climate wouldn’t be surprised if America specifically decides to go off a cliff while the rest of the world watches.

This is completely misguided. AI technology in many different forms has been under development for more than 65 years. So-called “chatbots” are just one tiny aspect of it that happened to have gained recent media attention because it’s something the general public has easy access to, but it’s just the tip of the iceberg of AI.

AI in many different forms will continue to be developed and will become an essential part of our technology infrastructure. “Chatbots” will cease to be toys and will become indispensable virtual personal assistants as a first step in the evolving merger of human intelligence with machine intelligence.