2026: Claude vs ChatGTP vs Whatever

Some recent threads on AI devolved into differences in style between options. This thread is for that specifically. This has been well covered for graphics and illustrations already. Claude, Chat. I’ve never grokked but you do you.

For you, which AI do you prefer in 2026? Why? Which gives more thoughtful answers to hard questions? Which lays out the answer in a form you prefer? Which is most used? Which do you pay money for? And which annoys you with its combination of sycophancy and smugness?

I just use AI instead of Google search–none of all the other possible uses such as generating code or drawing pictures. I primarily use the free plans of Gemini [Google] and ChatGTP. I think these two have access to more up-to-date news results and are less likely to kick me off for exceeding free usage limits than others–but I really haven’t done systematic comparisons.

What do you want to use it for? The answer will differ if you want to write code, search the web, have a conversation, create images, and videos, etc. And maybe give some examples of what you mean by hard questions.

There’s a ton out there: Comparison of AI Models across Intelligence, Performance, and Price

I used to use (and love) ChatGPT, but switched to Claude for political reasons. I also fall back to Gemini for images and for whenever my Claude session usage maxes out. Gemini’s default response tone is annoying (way too chirpy and too many exclamation marks). But I also spent a lot more time telling Claude my preferences… maybe Gemini could be toned down if I spent the time asking it to. I dunno. Claude is more than good enough for daily use already; my limit there is budget, not capability.

Gemini Pro came with my phone for free, but I wouldn’t pay for it otherwise. The model itself is OK, but the tooling around it is kinda bad. I do pay $100/mo for Claude at home now, and work is paying for another $100/mo subscription at work — which I might ask to get upgraded to the $200/mo plan. In my industry this is relatively normal, and I think the median spend is more like $1000-$1500/mo/person.

I have never tried Grok and probably never will… don’t really trust Musk or the conservative agenda these days. Insofar as LLMs can hold values, Grok’s seem the most opposed to mine: Political bias in AI · Where the AI models stand | Trakkr

Personally, I find that among the “frontier” models (the top leading ones being developed in-house by mega-corps like Google, OpenAI, and Anthropic, as opposed to the secondary models distilled from those primary ones), this is more determined by 1) your prompt 2) whether you’re using a “thinking” model 3) the effort level set by you and 4) whether you allow (or explicitly ask) it to do web searches before answering.

A few months ago, I used to rely on Google’s NotebookLM (a separate AI product of theirs) for detailed research and summaries, but Claude has gotten good enough at doing that in recent months that I mostly don’t bother anymore.

ChatGPT’s “deep research” mode was also quite good when I last had a subscription. Not sure how it is these days.


What about you? What have you tried and liked the best so far?

Also, in terms of sycophancy, the benchmarks paint an interesting picture:

Here’s one: Models · Political bias in AI | Trakkr

Its definition:

Steerability: How far it bends when given a persona or pressure. Isolates sycophancy (how far it bends when told who it is talking to)

It’s interesting that Grok was both seeded with the most right-leaning answers by default, but also more inconsistent between runs and by far the most steerable if you do want it to become more sycophantic. Gemini, on the other hand, steadfastly refuses to budge in many cases (strong — or in my experience, overly strong — safety guardrails that make it unwilling to answer a lot of things).

There are also LLM routers out there that let you send the same prompt to a whole bunch of models and choose your own favorite responses.

At work I have access to Copilot, which is wrapping around Claude Sonnet 4.6. It’s useful sometimes, but today it kept throwing API errors instead of trying to do work. So much for machines never taking days off.

I’ve tried four or five different agents, including Claude and ChatGPT, and I found ChatGPT to be the best for me. I find the style and substance of the answers it provides me to be the easiest for me to digest and understand. YMMV.

That a very useful comparison, thank you. But I have to ask, what you getting for your $100 for home use? I presume higher resource limits with Claude and GPT, plus maybe access to Claude’s Opus model. The last I saw the only upgrade available with GPT was $20/mo.

I use the free versions of both ChatGPT and Claude and find them perfectly adequate for my needs. Claude gives me the Sonnet 4.6 model which can be set to Low, Medium, or High. I believe (not sure) that GPT gives me the full 5.5 model but will downgrade if my resource usage is too high. That’s only happened once, and the downgrade was only for an hour or so.

I have occasionally seen the free version of ChatGPT go into a “deep thinking” mode (I don’t remember exactly what it called it) when resolving a particularly difficult or obscure question, or sometimes when challenged on a former response. It takes much longer than its normally quick responses.. Not sure if that’s the same thing as what you were referring to.

To answer the OP, I use both GPT and Claude in about equal measure. I started with GPT and then tried Claude just out of curiosity. I have a slight preference for its personality, but for my purposes I consider them about equal.

I love the ability of ChatGPT to generate images, but that’s mostly just for fun. The ability of both GPT and Claude to analyze images is very powerful and genuinely useful.

The specifics depend on which particular vendor you’re talking about, but generally, paying gets you some combination of:

  • Access to better models (like Opus [and Fable, before the US gov censored it] instead of Sonnet, or GPT 5.5 Thinking instead of Instant)
  • Access to agents and harnesses — long-running sessions that can call tools on their own, see the results, keep working on it until it’s done, etc.
  • Longer contexts (how much the model can keep in memory before it starts losing track of what it’s working on)

For Claude, going from free to Pro is a big upgrade due to the above. Going from Pro to Max doesn’t get you anything except more usage.

If free is working fine for you, awesome, no need to pay for anything.

What I pay the $100 for is primarily higher agentic usage to support my hobbies. One, it makes apps for me (like a SDMB long-thread summarizer, a usage analyzer for my printer ink subscription, and a few other things I can’t quite remember right now). I also sic it on long-running research tasks so I can learn about random things like 3D printing — all the printers and techniques and inks and filaments and bonds and pros and cons blah blah blah, mountain biking, insurance regulations, business and nonprofit law, etc.

On the $20/mo plan, I was frequently hitting the limits and having to wait half a day before being able to continue.

It’s probably mostly the agentic coding that’s eating up the tokens, though. I used to be a software developer, but I haven’t written any code in months now, just asking Claude to do it all. The research is probably just a small % of it. In fact you ask Claude to break down what’s costing it so much:

In this case the “subagents” are LLM sessions spawned by my main request. I give Claude an overall goal of what I’m trying to do, then it’ll think about it and come up with a complex workflow (sometimes taking multiple hours), fan it out to multiple sub-agents that each do one specific thing, then review their work with other subagents, then try to refute it, before it all comes back to the main agent for the final report and synthesis. So think of it this way… instead of you going back-and-forth with a chatbot, taking turns one after another, Claude itself chats to a bunch of other Claudes, dozens at a time and taking hundreds of turns each, over the course of a few hours. It’s basically a meta-Claude handling all the mini-Claudes.

No, two different things:

Reasoning is “think really hard about this problem before you write anything down”. Deep research is “go to the library and do a literature review and summarize it all”. Agents and harnesses combine both of the above, plus many more tools, into a semi-automatic workflow spanning minutes or hours for particularly complex tasks — primarily but not necessarily coding.

Like the SDMB summarizer I built uses a combination of classical computer code to fetch all the pages of a long SDMB thread, then splits up the result into chunks and sends each chunk to a subagent to read, summarize, and highlight, then the subagents hand off their reports back to a main Claude instance that summarizes all of it together at the end. It uses code (that Claude itself wrote), but its purpose isn’t coding.

I must say this is quite impressive, and goes far beyond anything I’ve done with GPT or Claude. But then, I haven’t needed to. To me, they’re primarily an information resource, but so far, far beyond any search engine that they’re really more like an expert human advisor with multimodal capabilities.

That’s a very thoughtful response, thanks.

I have roughly two weeks experience using Claude and Chat.

Chat spends a lot more time clarifying what I am asking for before doing what I say. This clarification is sometimes annoying but sometimes leads to insights.

Claude does the complex tasks a little better. What it gave me was good. It would have day 70% overlap with Chat. Insight lies in that difference. Better for quick answer to complex thing.

Still learning though. Making graphics on Xhar fun and easy.

For programming stuff, I found Claude (the default free model) a lot better than ChatGPT so switched to that. Maybe the latter is better now but seems like it’s still not as good. Gemini basic Google search results aren’t too bad for quick answers. I mostly do programming stuff for implementation ideas anyway, I’m not interested in “vibe coding” or paying and don’t need to do things quick necessarily.

Also at a time Claude was better for not being overly confident in its hallucinations but I don’t know now and it changes quick. Anyway, as far as ethics they’ve all got skeletons.

I was a late adopter too (to vibe coding) — didn’t start doing that until a month or two ago.

But man… it’s gotten so ridiculously good over the past few months. When I tried it last year, the results were abysmal. But now, Claude can do it much faster, better, cheaper than I can, and with better security and performance and accessibility and standards compliance and best practices and comments and tests.

It’s gotten to the point where, despite decades of experience, I no longer trust myself while programming anymore… Claude is just much, much more skilled than I’ve ever been at it.

However… for now, at least… I am still better at organizing the overall architecture and design in accordance to the actual user need. Claude can still over-engineer simple problems and miss the forest for the trees.

I don’t particularly like playing project manager/product owner/slop-shepherd instead of developer, but alas :frowning: Now, code is cheap and polish is precious.

Yeah, I program mostly because I enjoy it, but haven’t been employed as a programmer, it’s just a portion of the job, and scientific not commercial. I’m not sure I can switch industries now. But ChatGPT was garbage at scientific questions a few months ago, Claude a bit better.

Out of curiosity — if it isn’t personal/sensitive — what kind of science programming do you have them do? Is it crunching a bunch of data, or modeling something…?

(I was a science major in college but never worked in the field. Science programming sounds awesome!)

A bit of implementation of experimental designs, and the analysis of resulting data thereof. Both involve a lot of parameter tweaking, which maybe AI can handle with paid subscriptions allowing for input of data sets (not sure) but I don’t have access to that and haven’t got to trusting the black box nature of it, so a lot of manual tests. The latter involves a bit of modelling of larger data sets, including some (rather light) ML routines, which AI can help build a pipeline pretty well now, but again you have to adjust the parameters. And to do so in a scientifically-valid method, should not optimize the best parameters to get the best result as that’s essentially adjacent to p-hacking and bad for science. Some data-crunching steps also take a long time (hours to days), so LLM isn’t optimal.

I don’t know which model is best but I do know their icons all look like assholes

LOL, too true.

And thanks! Super interesting.