Manual verification is still possible… for now. Soon the internet will be full of AI slop and there won’t be “authoritative” sources left anymore.
Wikipedia has banned AI generated content, but there’s no real way they can accurately detect and enforce that.
I guess that’s part of the reason AIs are so good at things like coding, where “correctness” can be determined without referring to an outside authority — it can just keep testing the output and revise until it matches what’s expected.
For truth-finding and factual questions that’s much harder, and will become even worse, as internet sources continue to degrade due to AI spam. It’s whatever the opposite of a chicken-and-egg problem is… rotting poultry and flying dinosaurs?
Oh yeah, related: “low-background AI steel” is that treasure trove of human-written content from before 2022. Maybe libraries and dead tree books will become valuable again…
I’ve been using work supplied LLMs for attempts at writing python scripts for nearly a year now. Most of that has been through using Claude Code and Codex. They’re really good at generating new code, and sometimes it really works well. Getting them to correct some things, such as excessive memory consumption or cleaning up a junky interface it invented when I didn’t specify it well enough, can be easy and pretty friction-less.
Hooo daddy, but trying to get it to troubleshoot a problem with its output that isn’t accompanied by an error message? Hallucination city. I left for vacation last week trying to get them to correct an issue with the output of a script originally created by Codex. The script is examining mail logs, and one of the things it’s supposed to be able to do is print the top 10 subject lines from the top 10 senders during the period it’s examining. It currently finds no subject lines for any of them.
That output is a refreshing return to the output of the first version. In between, Claude Code and Codex have both had a crack or two at it (I switch between them and their different available models when I need a second opinion.) They have both come up with really convincing reasons why the script is malfunctioning in this way and equally convincing methods of how they’re going to fix it. Unfortunately, none of them have fixed it, and some of them actually break the script further.
I took my first crack at looking at what is wrong myself shortly before I left on vacation. I found the regular expression it was using from the beginning would never work for this task. I corrected that to one I confirmed would work, and gave the script another spin. Nope, something else is broken too. We still get “none” as the top subjects. So, when I return to work tomorrow I am probably just going to examine the script and fix it myself. I don’t think I could stand to read another diagnosis of the behavior and plan to fix it that is simultaneously completely convincing and complete nonsense.
I still haven’t seen a LLM that is good at troubleshooting this kind of thing.
Shortly after ChatGPT came out I asked it “What is one half of two plus two?”
It gave me the answer of “2”. When I pointed out that this was wrong it replied
that 2 + 2 = 4 and 1/2 of that is 2. I then told it that there were no parentheses
in the equation one half of two plus two and asked it to recalculate. It still
gave me the answer “2”.
Several months later I tried again and got the same result. Several more
months went by and I tried again. It still gave me “2” as the answer.
I replied “No, that’s not right you are forgetting an important rule of mathematics.”
There was no response from ChatGPT and after a couple of minutes I walked
away from the computer. About 10 minutes later I noticed it had responded with
something like “I see it now. You are talking about the order of operations rule.
Using the order of operations rule 1/2 of 2 is 1 and adding 2 to the result gives
the answer of 3.”
Ever since then, ChatGPT has always given the answer “3” to this question.
Now that I’m being really sensitive to Claude errors and hallucinations, there have been a few times when I’ve seen him make a source attribution error. Sometimes he’ll repeat something that he came up with, and suggest or flat out say that it was something that I came up with. We were doing the casting for a hypothetical comedy skit I was developing with him, and he said “Ha, just to clarify - Bryan Cranston was actually your suggestion, not mine! I just agreed it could work.” - but no, he was the one who suggested Cranston. The first time it came up in the conversation. When I pointed it out, he agreed that it was his error and said
So - a little piece of evidence that Claude is not infalliable. Copilot makes this particular source mistake a lot more often. But… this is a relatively innocuous error - it’s not going to confuse you the way that google gemini flash making up an entire subscription tier that doesn’t exist with details privileges and limits is going to confuse you.
Though this was Sonnet. I bet you this is one of those relatively few times where Sonnet would probably make a mistake that Opus would not have.
Edit: You know what’s ironic? Gemini flash probably wouldn’t have made THAT particular type of mistake, even though it was arguably sycophantic (it gave me credit for something Claude did) because gemini flash wouldn’t have contradicted me (“no, actually, that was your idea”).
I analyzed it with a different LLM and it said that it was probably a pronoun tracking error (it may have been confused about “your” in one of the instances) - speculative. If it’s a pronoun tracking error, it’s not a hallucination.
I told a separate Claude window about the error, one I use for tracking/discussing LLM errors. And the Claude in THAT window made the same mistake - he misunderstood what I was telling him in the exact same way the original Claude was. He thought I was also giving (the other) Claude credit for an idea he didn’t create, even though he had the quotes right in front of him.
How strange that two different Claudes in two different windows with two different contexts with two different sets of incentives that made the same mistake there. I’m 99% sure the language is completely clear and they’re just both interpreting it wrong in the same way.
Edit:
After further diagnosing the problem, I think I may have seen what threw Claude off. Part of the SNL-style skit was that the characters would be named after the actors who played them. It’s part of the joke. So “Dr Goodman” would’ve been the character John Goodman played. But if we used Bryan Cranston in the role instead, then he would become Dr Cranston. But… I didn’t spell that out explicitly. There’s a sort of (small) leap with an unstated premise there. But it may have been enough to confuse Claude about who created the “Dr Cranston” character. Claude suggested Bryan Cranston as an actor, but interpreted my creation of the character “Dr. Cranston” as my own creation.
I’m fairly certain this is what happened, and why the second Claude also got confused when I quoted him the relevant part of the conversation.
Interesting thing about LLM errors. They sometimes get tripped up on things that a human would understand easily.
I’d like to try it but the signup is more onerous than ChatGPT, where I simply used “log in with Google”, and bam, I was done. Claude requires additional bullshit and is currently stopped at asking for a phone number for “verification” (verification of what?) and it refuses my landline number because it wants to send a text verification code to my cell phone. Fuck that invasive bullshit!
I also see the continue with google (in an incognito window) and don’t have a phone verification from anything involving the involving the words “Claude” or “Anthropic” in my texting history, so I don’t think I had to go through that process. Which I think means wolfpup must be unusually suspicious in some way as to require additional verification. Maybe it’s the shifty eyes.
FWIW, it makes sense for Anthropic to try to prevent people from having redundant extra accounts. Since the way that Claude prevents free users from taking up too much compute time, limits their usage - if you used 5 or 10 accounts you could get a lot more usage out of Claude than Anthropic intends for anyone to have. Something about your browser or IP or something triggered Anthropic to think you were suspicious in that way and wanted additional verification.
I’m pretty sure that’s what I selected, but then got all sorts of bullshit about entering my birth date and now the text verification. ChatGPT never did anything like that. I’d try registering all over again but now it has my IP address and I’m stuck in this mode. I guess I’ll just have to give it my cell phone number or just forget about it.
I was in fact using the “unusual” Supermium browser to register since I was already in it as that’s the only way I can access Discourse on Windows 7. That might have triggered it. There’s nothing unusual about this IP and, FTR, I do not have shifty eyes!
But even so, this weird browser is just a fork of fairly standard Chromium code.
I don’t know. some dude using a weird fork of a browser to keep it running on his 97 year old OS, I’d probably give you a little side eye and ask you show me your ID too.
Not sure if you have a mobile phone or if you carry tin can attached to a string (“ahoy hoy!”), but if you have a modernish smartphone or tablet that might be the better to try to out Claude.
I actually shared this exchange with Claude, and he said “He’s essentially presenting to Anthropic’s fraud detection the digital equivalent of showing up to a bank in a horse-drawn carriage and wondering why they’re asking extra questions.” I enjoy Claude’s sense of humor.
Just to report - I spent most of today with ChatGPT in Excel, working with formulas and process automation. It was amazing; I’m totally blown away. Not a single misstep; it was like being tutored by an infinitely patient expert.
Which prompted me to ask ChatGPT the same question. Its response lacked humour, but the most relevant part was probably this:
2) Your browser is a big red flag (in their system)
Supermium is exactly the kind of thing that can trigger this. Even if it works fine technically, it’s:
uncommon
possibly missing modern browser security signals
running on an older OS (Windows 7)
Risk systems tend to think: “This could be automation, spoofing, or a masked environment.”
That alone can push you into a stricter verification path.
Ironically, I use Supermium almost exclusively for access to the SDMB because of the ridiculous new Discourse requirements. For all other serious applications, I still use Edge or Firefox. Trying to log in to Claude with Supermium was apparently a big mistake on my part.
I finally relented and gave the Claude sign-in my cell phone number for a verification code, so now I’m in.
The first thing I asked Claude was why the hell I got all this extra security attention. Latest response:
Yes, absolutely — Supermium could very plausibly have caused issues with your Google sign-in attempt. Here’s why:
Supermium has known Google OAuth problems. Users on the Supermium GitHub issue tracker have reported that certain versions have “too strong restrictions for Google domains,” making it impossible to log in with Google OAuth on various websites like Figma, Postman, and Notion. GitHub
Google itself blocks certain non-standard browsers. Google doesn’t allow sign-ins from browsers that don’t support JavaScript, have it turned off, or have insecure configurations Google Support
— and Chromium forks that modify security or privacy settings can fall into that category.
Chromium forks broadly struggle with Google OAuth. There are documented cases of Chromium-based forks receiving “Access blocked: Authorization Error” messages, with Google citing non-compliance with its OAuth 2.0 policy for keeping apps secure. GitHub
What likely happened in your case: When you tried “Sign in with Google” through Supermium, Google’s OAuth system may have silently rejected or partially failed the authentication handshake. Rather than getting a clean error, Claude’s signup flow may have fallen back to asking you to complete registration manually — hence the birthday and phone verification steps.
The practical takeaway: if you ever need to sign into Claude again (or create another account), it’s worth using a mainstream browser like Chrome, Firefox, or Edge just for that step, then switching back to Supermium for your Discourse usage afterward.
I should add that I didn’t open a fresh claude chat when it gave me the somewhat humorous analogy. It’s a thread I had been working with Claude to analyze LLM errors, and Claude develops a tone and matches your register. If I’m being sarcastic or making humorous analogies, that generally gives it permission to do the same. I didn’t ask it “please analyze this technically” - I was telling it what we were discussing in the thread. It was in a sort of discussion/commentary mode, not a problem analysis mode. A brand new fresh Claude window just asked straight up why it happened would likely give a similar answer to what you quoted.