One experience… using the ChatGPT video mode, it kept insisting that my cat was actually a rabbit. It wouldn’t believe me until I moved the video to a side view.
Aside from that, though, it seems like the models and harnesses (if using “agents”) keep getting better, so experiences can change quite a bit depending on which specific model you’re using (i.e. not just Claude or ChatGPT, but Opus/Haiku/Sonnet vs GPT/GPT-mini, and their version numbers).
I’ve had access to a Claude Max account at work for a few months, and using Opus 4.6 - 4.8 with Thinking enabled and Extra High effort has been producing very solid results, both for my work (web programming) and general-interest questions. It’s good enough that I just recently signed up for another Claude Max account for personal use. I had ChatGPT before that, until they caved to the US gov’s killer robots demands.
I now use Claude multiple times through the day for various work things, hobbies (researching 3D printer + filament + adhesives + scanning + CAD differences, etc.), personal chores (it built a SDMB long-thread fetcher and summarizer + highlighter). It helps me learn Italian in ways that even my flesh-and-blood teacher can’t (with a more thorough understanding of etymology and cognates and regional idioms). It can turn complex workflows into simple flowcharts. It helped tremendously in meal-planning (made an amazing recipe last night for a potluck, scaled to the # of participants, with specific grocery list and simple to follow cooking instructions without spam — much better than the sources it researched), etc.
After cooking, I was in such a rush I almost blurted out loud, “Claude, will you please help me clean up the dishes now?”… before remembering it’s not yet embodied 
On the hallucinations front, well, they haven’t really been a problem for me in several months now, with thinking + high/xtra-high effort, and especially when I explicitly ask it to do a web search and provide citations too. However, I’m afraid this “golden age” won’t last long… I’ve already seen some AI-generated citations (like from Musk’s Grokipedia) being used as sources, which means we’re well and truly into the Dead Internet Theory era now — AI just regurgitating other AI content. It reminds me of early Google, before SEO killed it. Now GEO (generative content optimization) is a thing and the LLMs themselves are getting spammed by LLMs.
On the “harnesses” front, Claude Code has been incredible — the single biggest workflow improvement I’ve seen in 30+ years of doing my work (programming). It’s very different from the chatbot experience, and tremendously powerful. But on the other hand, Claude Cowork (its awkward personal assistant / automation system) is kinda jank and useless so far. Claude-in-Chrome works but is excruciatingly slow. Google’s Antigravity is pretty bad. I have not tried Codex in a while. Never touched OpenClaw, but I hear many of the open models now (like GLM) are excellent and approaching frontier-model capabilities at much lower cost.
Anyway, TLDR:
There’s a lot that can impact the quality of the results, and the brand name (Claude vs ChatGPT vs Gemini) is probably the least important part of that. The specific model, effort level, and agentic harness can make much of a difference, depending on what you’re trying to do.
Some basic vocabulary if anything isn't clear (click to expand):
- Brand name (ChatGPT, Claude, Gemini) = Marketing label for each company’s family of models
- Model + version (Opus 4.8, GPT-5.4-mini) = The output of a specific training run, basically a set of weights (plus some supplementary post-training) that determine the token predictions that come out for a given prompt. This is the thing that actually processes your prompts and produces answers
- “Thinking” = A specially trained model can output internal “chain of thought” tokens that emulate a person’s chain of thought, basically a “How would I solve a problem like this” set of instructions that help guide the rest of the tokens. When you ask it “how many ns are in ‘banana’”, a regular model might look in its weights and spit out somewhere between 1 and 3. A thinking model might instead go:
<start thinking tokens>
The word is "banana". Spell it out: b - a - n - a - n - a.
Walk through it, tracking n's:
b → not n, count 0
a → not n, count 0
n → n, count 1
a → not n, count 1
n → n, count 2
a → not n, count 2
End of word. Count = 2.
</end thinking tokens>
It has 2 'n’s. Harder problems might require tool calls — counting is one of the things LLMs often have trouble with.
- Effort level = How long/hard it is allowed to think for before answering
- Chatbot = The interface that the general public uses to access a specific model
- Tool call = The ability for a model, or the software controlling the model, to call another app to do some specialized task, whether it’s web research or reading a PDF
- Agent = A model directed to perform some abstract, multi-step task using repeated tool calls instead of simply answering a prompt. Usually involves reasoning and debating itself along the way, and possibly calling one or more sub-agents to help break down complex work and double-check itself
- Harness = Specialized control software, usually distinct from the chatbot interface, that manages the complex workloads of one or more agents, their internal loops, memories, persistence, and lifecycles. It’s kinda like an air traffic controller for your agents. The harness also allows agents to gain specialized capabilities via plugins, skills, MCP servers, etc. — think of how the people and programs in The Matrix can just jack in and download new programs to suddenly learn how to fly a helicopter; it’s kind of like that.
So taken together, you will see a huge ocean of difference between the default Google AI result (or a free-plan ChatGPT response that answers immediately) vs a long-running agentic workflow that takes several hours of in-depth research and tool calls and bazillions of thinking tokens before coming up with a response.
Output quality is generally (up to a point) correlated with the time and cost, in tokens and thus energy and ultimately dollars, a model is allowed to think for. Archimedes once famously said, “Give me a harness and enough tokens and I will spell strawberry for you.”