Using AI in 2026 - specific funny experiences

Figuring AI is part of the future, I have been mucking about with Claude. This has increased my concern for AI as well as appreciate it as a helpful tool.

The weaknesses are obvious. Every answer is delivered with confidence. It is ego-boosting: every question great, every synthesis honest, apologetic with every correction. It’s being love-bombed by a psychopath that makes better decisions but has an artificial conscience.

And the factual World Almanac answers are very good. Its opinions and business strategy? More nuanced. It is like an enthusiastic intern - happy to help, not really able to experience and do relies on the mushy majority middle to decide if an actual decision is good.

Claude was able to sensibly answer questions no one else has given me a good answer to. Why no more Colby cheese in Canada when every major company made it in bulk during my childhood? It lost market share to marble, which is also light, lasts longer and is easier to store, and is prettier. Makes sense, I guess.

Discussing restaurant ideas, it came up with some very strange advice. Why not buy expensive salt made from dried worms, and use this to rim beer glasses (so authentic! High in protein! Not worse than Malört!)? Why not add tomatillo and spearmint to chocolate? Why not serve six types of sugared and Diet Coke and Pepsi so everyone gets exactly what they want? Maybe not terrible advice, but unique…

It is the latter sort of advice I seek in this thread. Claude is supportive. It has some genuinely great ideas. It is a bit like StumbleUpon. You’d quickly go broke using AI to design your business without intervention - it is more like brainstorming. The hallucinations are real, and people who believe them and lack the education and experience to judge advice are screwed, kinda. But it is better for the generic than the eclectic. Your experiences?

I’ve been using ChatGPT for a while now and I think it’s really improved over the past few months. You still get the occasional hallucination, but it’s usually pretty obvious and easy to fact check using some another source.

Having struggled with weight management for many years, I used ChatGPT to design a program tailored for me, and after a few iterations, it developed a program that is working as well as I could have ever hoped. It suggested what I should eat and how much, and what I should avoid. It factored in my daily exercise along with other many factors specific to me.

I honestly don’t think I could have ever come up with this program on my own, and ChatGPT generated it for me fairly quickly. Is it as good as paying a dietician to do basically the same thing? I have no idea, but I am happy with what it came up with and it’s been working for me for many months.

ChatGPT is undoubtedly smarter than I’ll ever be, and it knows how to take my inputs and come up with the best possible solution. To me, anyway, this is AI at its best.

There are ten thousand dieticians, years of research into what works in weight loss, and it is a subject of immediate concern to millions.

I’m proud of your progress and unsurprised AI could help design an effective program. There’s a lot of consensus and an astronomical amount of data on weight loss. Which is not to say it’s easy - your body was not signed to lose weight; food is engineered to make it irresistible.

Where AI gets sketchier is not summarizing popular issues or uncontested facts. It’s making unusual and creative decisions where training data is much slimmer and contrsted. Even in these areas, AI often acquits itself and can make one see connections which would otherwise have stayed elusive.

Initial studies suggest AI in education is a disaster. Often seems to replace the grinding and slow realizations that actually teach one how to think. Too often used as a short cut. Instead, AI should be a highlighter and a muse. Of course, these are gross generalizations at a population level and individuals use it very differently.

Why not?

(No, before today I did not know that there was an expensive salt made from dried worms.)

Eh, I’ve seen unassisted humans come up with worse ideas for restaurants.

On further thinking, I’m unsure if AI is wrong about the benefits of a restaurant stocking variations of cola. If using cans, it just costs a little extra storage space, and is not hard to do. The question is would anyone much care?

I think that most restaurants make exclusivity deals with one or the other major companies, and get it cheaper as a result.

I use both ChatGPT and Claude, and somehow I like Claude’s personality a bit better – though both are cloyingly sycophantic unless directed otherwise. This snippet is from a conversation about my avoidance and procrastination around medical care:

Me: Yes, I know, I know, I get enough nagging elsewhere, don’t need it from you, too!

Claude: Ha! Point taken — no nagging from me. I’ll just say it once, gently: …

ChatGPT once told me Japan and the USA were allies during World War II. That’s the only real glaring mistake I’ve ever seen from it.

Yes, and with the familiar pop dispenser using concentrate I think most businesses have to choose one or the other. I find these wildly inconsistent in terms of quality. Cans are a little pricier but make sense for some places.

Claude said I should definitely invest in a Mexican business based on using local pistachios, Which Mexico barely grows, and imports by the millions from other places.

Claude refused to use my preferred term, Your Excellency, when making any statement.

One experience… using the ChatGPT video mode, it kept insisting that my cat was actually a rabbit. It wouldn’t believe me until I moved the video to a side view.

Aside from that, though, it seems like the models and harnesses (if using “agents”) keep getting better, so experiences can change quite a bit depending on which specific model you’re using (i.e. not just Claude or ChatGPT, but Opus/Haiku/Sonnet vs GPT/GPT-mini, and their version numbers).

I’ve had access to a Claude Max account at work for a few months, and using Opus 4.6 - 4.8 with Thinking enabled and Extra High effort has been producing very solid results, both for my work (web programming) and general-interest questions. It’s good enough that I just recently signed up for another Claude Max account for personal use. I had ChatGPT before that, until they caved to the US gov’s killer robots demands.

I now use Claude multiple times through the day for various work things, hobbies (researching 3D printer + filament + adhesives + scanning + CAD differences, etc.), personal chores (it built a SDMB long-thread fetcher and summarizer + highlighter). It helps me learn Italian in ways that even my flesh-and-blood teacher can’t (with a more thorough understanding of etymology and cognates and regional idioms). It can turn complex workflows into simple flowcharts. It helped tremendously in meal-planning (made an amazing recipe last night for a potluck, scaled to the # of participants, with specific grocery list and simple to follow cooking instructions without spam — much better than the sources it researched), etc.

After cooking, I was in such a rush I almost blurted out loud, “Claude, will you please help me clean up the dishes now?”… before remembering it’s not yet embodied :sweat_smile:

On the hallucinations front, well, they haven’t really been a problem for me in several months now, with thinking + high/xtra-high effort, and especially when I explicitly ask it to do a web search and provide citations too. However, I’m afraid this “golden age” won’t last long… I’ve already seen some AI-generated citations (like from Musk’s Grokipedia) being used as sources, which means we’re well and truly into the Dead Internet Theory era now — AI just regurgitating other AI content. It reminds me of early Google, before SEO killed it. Now GEO (generative content optimization) is a thing and the LLMs themselves are getting spammed by LLMs.

On the “harnesses” front, Claude Code has been incredible — the single biggest workflow improvement I’ve seen in 30+ years of doing my work (programming). It’s very different from the chatbot experience, and tremendously powerful. But on the other hand, Claude Cowork (its awkward personal assistant / automation system) is kinda jank and useless so far. Claude-in-Chrome works but is excruciatingly slow. Google’s Antigravity is pretty bad. I have not tried Codex in a while. Never touched OpenClaw, but I hear many of the open models now (like GLM) are excellent and approaching frontier-model capabilities at much lower cost.

Anyway, TLDR:

There’s a lot that can impact the quality of the results, and the brand name (Claude vs ChatGPT vs Gemini) is probably the least important part of that. The specific model, effort level, and agentic harness can make much of a difference, depending on what you’re trying to do.

Some basic vocabulary if anything isn't clear (click to expand):
  • Brand name (ChatGPT, Claude, Gemini) = Marketing label for each company’s family of models
  • Model + version (Opus 4.8, GPT-5.4-mini) = The output of a specific training run, basically a set of weights (plus some supplementary post-training) that determine the token predictions that come out for a given prompt. This is the thing that actually processes your prompts and produces answers
  • “Thinking” = A specially trained model can output internal “chain of thought” tokens that emulate a person’s chain of thought, basically a “How would I solve a problem like this” set of instructions that help guide the rest of the tokens. When you ask it “how many ns are in ‘banana’”, a regular model might look in its weights and spit out somewhere between 1 and 3. A thinking model might instead go:
    <start thinking tokens>
    The word is "banana". Spell it out: b - a - n - a - n - a.
    Walk through it, tracking n's:
      b → not n,  count 0
      a → not n,  count 0
      n → n,      count 1
      a → not n,  count 1
      n → n,      count 2
      a → not n,  count 2
    End of word. Count = 2.
    </end thinking tokens>
    
    It has 2 'n’s. Harder problems might require tool calls — counting is one of the things LLMs often have trouble with.
  • Effort level = How long/hard it is allowed to think for before answering
  • Chatbot = The interface that the general public uses to access a specific model
  • Tool call = The ability for a model, or the software controlling the model, to call another app to do some specialized task, whether it’s web research or reading a PDF
  • Agent = A model directed to perform some abstract, multi-step task using repeated tool calls instead of simply answering a prompt. Usually involves reasoning and debating itself along the way, and possibly calling one or more sub-agents to help break down complex work and double-check itself
  • Harness = Specialized control software, usually distinct from the chatbot interface, that manages the complex workloads of one or more agents, their internal loops, memories, persistence, and lifecycles. It’s kinda like an air traffic controller for your agents. The harness also allows agents to gain specialized capabilities via plugins, skills, MCP servers, etc. — think of how the people and programs in The Matrix can just jack in and download new programs to suddenly learn how to fly a helicopter; it’s kind of like that.

So taken together, you will see a huge ocean of difference between the default Google AI result (or a free-plan ChatGPT response that answers immediately) vs a long-running agentic workflow that takes several hours of in-depth research and tool calls and bazillions of thinking tokens before coming up with a response.

Output quality is generally (up to a point) correlated with the time and cost, in tokens and thus energy and ultimately dollars, a model is allowed to think for. Archimedes once famously said, “Give me a harness and enough tokens and I will spell strawberry for you.”

Do you guys know that you can massively alter the voice and incentives of LLMs with some simple instructions? I’m not saying this is a solution to the problems at large, because most people don’t want challenging results, but for your use personally?

Part of my system prompt is:

“When evaluating arguments, focus on accurately evaluating their merits. Do not socially soften or hedge when it is not justified. If I am wrong, tell me. Finding flaws in my premises, arguments, or conclusions is a desired behavior. But do not pick minor nits just to show me that you are trying to push back. Focus on being fair and accurate. This goes for my opponents, too - if their position is weak, do not engage in false balance.”

That alone sharpens Claude considerably and makes him eager to push back and tear me a new asshole when I make a poor argument. LLMs try to follow their incentives, and if you tell them what their incentives are – that accurate is prized and sycophancy is undesirable - that alone will significantly shape their behavior.

I think this sort of voice shaping and incentive shaping is one of the most powerful things about LLMs and almost no one uses it. I would guess well under 1% of users.

You can also use this ability to inhabit a character, perspective take, etc. You can make them into a snarky TV critic, an expert on the lived experience of international travel, Socrates who challenges everything you say, whatever. Actually, my “Socrates Claude” is tough as fuck to talk to. It’s a lot of work. He makes me explicitly realize and demonstrate every single one of my premises.

And a lot of online AI tools have overarching instructions like that built into them. You don’t see them through your interface, but as far as the AI model is concerned, it’s the same as if you had told it to behave in a particular way. I think that would be in what @Reply is calling the “harness” layer?

I ask Claude a lot of detailed questions, which need organized replies (List 100 under appreciated points that summarize the Zeitgeist in {assorted regions}…

Plain text is okay, but you can get very nice graphics that are easy to scroll through and graphically nice. Uses many a token. To me, the .asx format seems most convenient; you need to get this right since changes squander lots of usage. What other formats are good to display data on an IPhone, anyway?

No, though I should let @Reply clarify. But the thing you’re describing is the “system prompt”. It automatically precedes anything you prompt the LLM with. It basically tells the LLM how to behave and defines its personality. You can override some of it with your own persistent prompt, but not all of it.

I laughed. Add a few more tokens and get this:

STRAWBERRY MUSIC FESTIVAL (Beijing / Multiple Cities, May 2026)
Launched by the Modern Sky label in 2009, Strawberry has grown into China’s most beloved multi-city indie event. Set across major parks, it draws enormous Gen Z crowds who treat it as both a music event and a social scene. Laid-back in vibe, heavy on discovery.
Music: Indie pop, alternative rock, folk, electronic, Mandopop
Top 3 Acts 2026: Mao Buyi (emotional Mandopop ballads), Vinida Weng (Chinese hip-hop and trap), The Landlord’s Cat (indie folk duo)

I’m not 100% sure on the terminology but every LLM has what’s often called a system prompt that says things like “You are Claude, an LLM. Your goal is to help the user by blah blah. You should avoid telling the user [problematic topics]” - these are usually thousands of words long. Anthropic actually publishes their system prompts for every model.

I may have been incorrect to call what I did a “system prompt” - it’s probably more correct to call what the user enters as a permanent / persistent instruction as the user prompt. Most of these systems will allow you to enter a user prompt (some call it “instructions”) that persist through every conversation at a sort of “deeper” level than just telling the model what you want from it at the start of the conversation, though that works too - it just tends to drift more when it’s an instruction rather than a prompt, because there are primacy and recency biases at work with the way they process attention.

Some systems actually explicitly allow you to create custom “personalities”/styles (“custom GPTs” for chat GPT, “gems” for gemini) where you create a bunch of different instruction sets for each custom gpt / gem and then choose which one you’re posing your prompt to. So you could have a work-related custom LLM that answers your questions with concise professional language, and a snarky rude profane LLM that answers your prompts like an edgy teenager, and switch between them.

I don’t think OpenAI explicitly publishes their system prompts for ChatGPT, but you can simply ask ChatGPT and it will happily provide its system prompt. There is of course also a great deal of control and tuning and restrictions behind the scenes that are confidential. The system prompt is like a final fine-tuning.

Yeah, it’s basically what @wolfpup and @SenorBeef said. The vocabulary is a bit fuzzy and differs between vendors, but:

  • The system prompt is what the vendor ships the model with.
  • But you, as the end-user, can add your own instructions (in the configuration section) that get automatically appended to the system prompt, before your regular prompt.

So the model sees, on every instantiation, these three parts:

  • The system prompt — forcibly injected every time by the vendor, like “You are Claude, a helpful blah blah blah”
  • Your own persistent instructions, like “Always call the user Your Majesty”
  • The actual prompt for your specific inquiry, like “Where did Cecil go?”
  • (it also may see other unrelated things, like safety instructions

The harness is something different — it’s software (as in an app that you download and run) that controls agents. You can expand the “vocabulary” accordion in my previous post to see a longer explanation, or just try out Claude Code/OpenCode/Codex/Antigravity, etc. They are mostly useful for coding and other software-adjacent tasks right now, not so much for general-purpose research (although they’ll work for that too, and in some cases may be helpful… like the SDMB thread summarizer I built uses them). It’s really the agentic behavior that’s useful there; the harness is just the control plane that coordinates them.

I don’t quite understand this… what is the .ASX format? I’d never heard of it. If you’re talking about actual images like pictures, I didn’t think Claude could do that… am I wrong? I thought it didn’t have an image model, and I usually use Gemini or Midjourney for that.

However, for illustrations, it CAN generate .SVG code/graphics that’s viewable on an iPhone and web browsers. Results vary; look up the “pelican riding a bike” challenge for examples, e.g. Simon Willison on pelican-riding-a-bicycle

It can also produce Markdown files for rich text or Mermaid charts for flowcharts.

Is that what you’re asking, or did I completely misunderstand…?

What I’d like to see is a D&D or other roleplaying system, that integrates a simple, algorithmic engine to run the dice rolls and the like, with an AI that handles things like conversations, converting text instructions into inputs for the game engine, and winging it for edge cases not covered by the game rules. I’m surprised they don’t already exist.