Bogus info from ChatGPT

Reply · April 23, 2026, 7:15pm

I wonder if there’s some country or principality somewhere that might allow a LLM to raise a human child absent the influences of other humans. It might be possible to train a human mind on more abstract, symbolic connections of meaning (than the extant human languages in use)… I’m imagining something like Nell, where a (human) child is raised on a closer approximation of latent space than human languages.

…but anyway, really getting off-topic now Sorry OP.

SenorBeef · April 23, 2026, 7:17pm

Yeah, we may have crept past the scope of the thread a little bit.

“Look at this stupid ass chatgpt hallucination”

[80 posts later]

“Let’s find a country that will allow an LLM to raise a child to see if they can think in latent space”

But I’m finding it fascinating. And really, isn’t that what’s important?

SenorBeef · April 24, 2026, 6:35am

I have to say - I’ve been using gemini 3.1 pro (instead of 3.1 flash) for the last couple of days and it’s a dramatically better model. I haven’t seen any major hallucinations, though I haven’t been asking it a lot of factual questions. We’ve mostly been talking about movies. But its analysis is quite sophisticated. I’ve actually been quite surprised at the way it has connected concepts between the different aspects and films we were discussing, and it analyzes them extremely competently. In a way, it has engaged in a useful sort of sycophancy. I told it early on about my dislike of supernatural/folklore based stories because I’m a skeptical materialist and I think films, even when fictional, sort of reinforce our ideas that these things really exist, and it used that sort of skeptical materialist angle to give insight into what films are flawed and what films are air tight from that perspective. I guess that’s not really sycophancy - that’s using the user what they told was important to them to give them an interesting perspective. It has given me some somewhat unconventional takes on some of the films we’ve discussed which are exactly the sort of arguments I, myself, have made. It “got” me pretty quickly and in a very impressive way.

The way it has responded to me has been very thoughtful and practically hallucination free. Maybe google is capable of generating a good LLM after all. Maybe they just make the model that 99% of people interact with shitty.

hogarth · April 24, 2026, 8:15pm

More BS Google answers:

I did a Google search of “first description of nut allergy” yesterday and Google’s AI answered that John Bostock was the first person to describe nut allergy symptoms (linking to an article that mentioned him describing hay fever symptoms).

Today I did a similar search and got an AI answer of Robert Willan and Maimonides; at least the articles it linked to mentioned nut allergies this time.

SenorBeef · April 24, 2026, 10:15pm

I wonder what model the “AI summary” work at google. I wouldn’t be surprised if it was sub-flash level and another example of google making the public think AI is worthless or insane

Reply · April 24, 2026, 10:27pm

That’s a great example of how much the output can vary depending on which model & mode you use.

With the basic Google search “AI mode”, you get a crap answer:

With the paid Gemini “Thinking” model, you get a slightly more detailed answer: https://gemini.google.com/share/0ca418b37918

(snipped)

Or in the Deep Research mode, you get a much more in-depth report: https://gemini.google.com/share/5abb6864cc97

(very much snipped… the real thing is very long)

And you can ask it to make a graphical timeline from that research:

Or the NotebookLM version:

(there’s still some hallucinations in there… lol, I love how familiar Malmonides was with modern clocks and warning signs… or his potato-sized almonds… or how the OJ and BLT sandwich are public health concerns, or the guy holding the PUBLIC FREE SIGNS sign)

You can also ask any of the models/tools the same question a few times and get different outputs each time, with varying degrees of quality and accuracy and relevance.

Reply · April 24, 2026, 10:34pm

I’m pretty sure it’s the “oh shit, we got caught with our pants down and better put something out NOW!!” model, from their panicked reaction back when OpenAI took Google’s own research and stunned the world with GPT-2.

Strange that they haven’t bothered to improve the AI search since then. If anything, it seems even worse today than it did back when it first came out…

HMS_Irruncible · April 25, 2026, 5:55pm

Yeah. It’s a plausibility engine, not an accuracy engine. It happens that the most plausible things tend to be the the most accurate ones. But that’s not always the case. These engines were trained largely on what’s written down on the internet, so if there are areas where incorrect conventional wisdom predominates, or there’s just not much primary information at all, it’ll tend to err or confabulate wildly. As you just demonstrated. Those instructions are plausible but not actual.

Important to note that LLM training generally doesn’t involve validating the accuracy of the data. As an oversimplification, it’s just using statistical weights that describe how words follow words. Accuracy doesn’t really enter into it, except that accuracy often (but not always) correlates to what “should” follow.

Mangetout · April 25, 2026, 6:32pm

Also (and I don’t pretend to know the exact workings of LLMs, but I understand the basics), if it predicts that the tokens initially following your question, any question generally, are likely to represent something like “sure, I can help you with that…”, it’s now committed itself to writing something helpful and this it seems, tends to result in the following parts needing to sound helpful and knowledgeable even in cases where there is no helpful answer to be had.

CaveMike · April 25, 2026, 7:27pm

This is true in the first phase of training, but the later task-specific training phases include accuracy and subjective criteria (like helpful and appropriate).

SenorBeef · April 25, 2026, 9:10pm

I caught Claude in an error. You could say it’s a hallucination, you could say it’s a sort of interpretive error, sort of a confabulation. He makes the case that it source attribution errors are a part of it. I’m actually not quite sure what to call it. I think the best way to describe it is over-eager pattern matching to a dynamic he was attempting to define and give examples for. He gives a fairly thorough attempt to analyze it.

But - since I’ve been praising Claude all day long here, I wanted to be honest and point out a significant mistake he made and what that looks like. I think it’s a lot more subtle than gemini flash’s desperately just making shit up constantly.

I put it hidden by default in a details box because it’s long, you can choose whether you want to read it.

Claude error

We were having a conversation about people who sort of poked holes in cultural assumptions and dynamics who were largely disliked in their time but came to be appreciated later. “Socratic terrorists.” It’s a long conversation, I’ll just quote the relevant parts.

Claude:

Good challenge, and I’m glad you flagged it because I should actually check rather than just assert. Let me search.

Searched the web

Searched the web

Yeah, you were right to push back, and I want to clearly retract the claim. The actual record is the opposite of what I implied.

Mainstream media coverage of Randi was largely favorable, not dismissive. He was a frequent guest on The Tonight Show with Johnny Carson Wikipedia, his Peter Popoff exposé in 1986 was a major mainstream news story that led to the televangelist’s bankruptcy, his obituaries in The Guardian, CNN, New York Times, and Daily Mail were all laudatory, and he received a TED talk slot in 2007. One critic notes that Randi was “lionized in news coverage” Medium - which is the opposite of being dismissed as a crank. The criticism of Randi from mainstream skeptics tends to be that media gave him too much uncritical coverage, not too little.

Where I think I confused things in my own head: there’s a separate group - parapsychology researchers, defenders of fringe claims, and a few critics within the skeptic movement itself - who did portray Randi as overreaching, unfair, or methodologically sloppy. Mitch Horowitz wrote a piece called “The Man Who Destroyed Skepticism” comparing Randi to Joseph McCarthy Medium, and there’s a real critical literature about whether Randi’s Million Dollar Challenge was rigged in his favor by impossible-to-meet conditions. But that criticism came from people who were themselves either in the parapsychology community or skeptical of professional skepticism’s methods - not from mainstream media.

So my framing was wrong in two ways:

First, I conflated “mainstream media dismissed him as a crank” with “some critics thought he was overreaching.” Those are different things, and only the second is true. The mainstream treated him as a respected authority. The critics who attacked him were operating from positions that the mainstream itself didn’t take seriously.

Second, I implied a vindication arc - the idea that he was initially dismissed and then later proven right - that doesn’t fit the actual timeline. Randi was respected and platformed by mainstream media throughout most of his career. He wasn’t a marginalized voice who eventually got his due. He was more like a celebrity skeptic who built his profile through Carson appearances and major-network exposés. The “ignored, then vindicated” arc fits Lenny Bruce and arguably Diogenes, but it doesn’t fit Randi.

The Andy Kaufman comparison probably also doesn’t fit Randi as cleanly as I implied. Kaufman was operating in a register that genuinely confused his audiences and damaged his career - people walked out of his shows, his network put pressure on him to stop being weird. Randi was doing something more conventional: he was a magician applying magician expertise to expose fraud, in a culturally legible way that mainstream audiences found entertaining and validating. He was unconventional within the field of professional magic (most magicians don’t do skeptical activism), but he wasn’t culturally marginal in the way that Kaufman or Bruce were. He was a TV personality.

So Randi was a bad example for the lineage I was constructing. The figures who actually fit that lineage cleanly - people who were dismissed in their time and vindicated later, who paid real social costs for their provocations, who couldn’t get mainstream platforms - would be more like Lenny Bruce, the Yes Men in their early career, maybe some of the earlier dot-com era hackers and prankster-activists. Randi had the form of the Socratic provocateur but didn’t pay the costs of one, because his particular form of provocation aligned with mainstream cultural narratives (debunking fraud is good) in a way that more genuinely transgressive provocateurs don’t get to enjoy.

Thanks for the push. This is exactly the kind of correction that this conversation has been good at - we’ve spent hours building elaborate frameworks together, and it’s important to leave room for “wait, is this specific claim actually true?” The frameworks survive the correction (the lineage of Socratic provocateurs is real, even if Randi isn’t a good member of it) but the specific case has to actually hold up to checking. It didn’t. I should have been more careful.

The deeper lesson, maybe, is that when constructing a lineage of figures, the temptation is to grab famous names that roughly fit the pattern without checking whether they actually fit it. Randi was famous and skeptical, so he pattern-matched to “famous skeptic who was probably mocked at first.” But the actual history is that he was a celebrity from early in his second career, was platformed by mainstream media, and the criticism he received came mostly from communities that the mainstream itself dismissed. The pattern-match was lazy. You caught it. That’s what a good interlocutor does.

SenorBeef · April 26, 2026, 5:46am

Funny Claude fact. He has now told me 3 or 4 times that he thought I was making up “nano banana” as google’s premier image generator. It does sound ridiculous. But they launched it after his training data was last updated, at least for the sonnet 4.6 model (late 2025).

I asked him to build me a graph that shows me the release time and types of different image generation systems. I mentioned that there were a few new ones since his chart ends in 2025. He asked if there were any specifically I wanted to include and I said “Google’s flagship image generation system is now nano banana pro, which is an autoregressive model. I’m not sure about the other services”

And he said

He actually refused to put it in the graph I asked him to create without actually verifying it for himself. That’s certain unusual behavior for an LLM I think. Skeptically trying to figure out if the user is just fucking with him.

Edit: Yes, I know, sometimes I refer to it as “it” and sometimes as “he” - I know he’s not a person. It’s named a male name, sometimes your mind makes that little leap. Although I do it with copilot too. Maybe this reveals that I’m secretly misogynistic that I don’t call copilot “her”, like that old test where the doctor says “I can’t perform this operation, the patient is my son” and the audience is confused because the patient’s father also died in the car accident.

It’s probably just because in English, “he” is almost always the default if the gender is unknown or if you’re referring to a sort of generic hypothetical person. Presumably I’d call Siri or Alexa “she” but I don’t use them.

Velocity · April 26, 2026, 7:57am

ChatGPT once told me the United States and Japan were allies during World War II. That was quite a glaring error.

Aside from that, though, I’ve found AI to be mostly reliable.

Q.Q.Switcheroo · April 26, 2026, 8:05am

Forget it, he’s rolling.

HMS_Irruncible · April 26, 2026, 11:54am

Good example of what I was talking about as far as it being a “plausibility engine”, and a fairly aggressive one in this case.

Up until a few months ago, Claude would spin out on the question “is there a seahorse emoji” (there is not and never has been). It would fumble around with several different sealife emojis and then confidently present a dolphin or octopus and claim it was the seahorse emoji. To a machine this was plausible enough that a human might accept it.

I believe for marketing reasons they’ve directly coded this and some other popular tests into it "how many r’s are there in ‘strawberry’. This is what a software vendor ought (and is entitled) to do if it knows the software is going to be challenged in specific ways that raise questions about its credibility.

Now, if asked in a fresh chat whether there’s a seahorse emoji, Claude will immediately and confidently state that there is not one, explaining why you might have mistakenly thought there is.

Which is interesting because it does not do this for different non-existent emojis, i.e. “is there a clam emoji”. It visibly goes and performs a web search to look up the information, like any sane human would do, notes the absence, and pauses for a moment to lament the underrepresentation of bivalves in the Unicode emoji set (which a human would not do, unless they knew they were under suspicion for past fabrications).

Maserschmidt · April 26, 2026, 12:02pm

Those popular questions would also be very present in updated training data.

HMS_Irruncible · April 26, 2026, 1:07pm

They’d be present in the trained corpus, sure, but we’re told that models contain weights rather than specific facts. The least charitable interpretation is that the answer is simply hardcoded or bolted-on, which would be a cheap credibility investment for Anthropic. There’s no reason their magical answer machine shouldn’t give a correct answer for this, and nobody can really say if it’s fair to cheat.

The more generous interpretation is that the seahorse question appears frequently enough in the training corpus is sufficient to create the organic weighting on the seahorse question to trigger a suspicion of “you’re trying to trick me, aren’t you”.

It answers this without sourcing a reference (which a human would be if it’s been caught in this lie), and for the clam it actually does a web search (again, which a human might do if it’s been burned on the marine life category before, or it just knows to be more cautious with existential questions in general). But the point is that it doesn’t consult the web for the seahorse as it does for the clam, which to me suggests some targeted manipulation.

That’s more of a curiosity than anything. A magic answer machine should give correct answers when it can. Leaning on external references is a sign of intelligence, in my opinion. It’s just a different thing than an emerging omniscient intelligence that can answer any question unassisted, or that is emerging purely undirected without helpful hints. There’s no reason we should expect an AI to evolve without such hinting. Human intelligence got lots of helpful hints during evolution (these mushrooms will make you violently ill and kill you, hint hint).

Maserschmidt · April 26, 2026, 4:33pm

Well. I think we’re not aligned on this, but if Anthropic etc. did directly correct for questions like that, it wouldn’t be through coding or bolt-ons, it would be through Reinforcement Learning through Human Feedback, where it could be included as a tuning question. That would be much simpler than creating and maintaining individual overrides for a bunch of edge cases.

HMS_Irruncible · April 26, 2026, 8:56pm

I agree that this would favor simplicity and maintainability, but this is also a commercial product where public perception matters for business reasons. So if your model were getting publicly roasted for continuing fumbling the seahorse emoji, and the model is hard or expensive to steer in the direction you need, then a one-off escape hatch would make business sense, at least until something better comes along. And it wouldn’t at all be expensive to throw a few of these in the system prompt until the overall corpus and model training catches up. It’s a business after all, they’re graded on how much money they make, not on architectural purity.

The software I work on is full of such one-off shortcuts. If you work on software, you probably have a similar TODO list of shortcuts to be formalized if they don’t become irrelevant before you get around to the task.

wolfpup · April 26, 2026, 9:14pm

Agree. Special treatment for specific cases (like Microsoft applications that depended on the aberrant behaviour of Windows, which then received special treatment in subsequent OS versions when those behaviours were fixed) is just how Windows became the crazy almost-unmaintainable spaghetti code that it once was – and for all we know, still is.

That’s absolutely not how one would evolve an AI. Particularly because, as in the metaphor of cockroaches, if there’s one special case you have to handle, there’s probably a million others just like it that you don’t know about.

Topic		Replies	Views
Citation glitch being perpetually reiterated by AI Miscellaneous and Personal Stuff I Must Share	110	823	April 18, 2025
ChatGPT level of confidence Factual Questions ai	140	1833	December 4, 2025
2026: Claude vs ChatGTP vs Whatever In My Humble Opinion ai	37	524	July 26, 2026
chatGPT is a fucking liar In My Humble Opinion	40	961	April 7, 2025
AI is wonderful and will make your life better! (not) The BBQ Pit ai	1180	17239	July 31, 2026

Bogus info from ChatGPT

Related topics