Bogus info from ChatGPT

@Maserschmidt is correct. The way you fix these blind spots is to add the examples to your fine-tuning training. No need to bolt on code or prompts. They are constantly training and updating model versions. As bug reports come in, they add the examples to the training.

Anthropic publically posts their system prompts in the name of transparency and there aren’t specific fixes like that. There are general directives (avoid creating NSFW content, etc) but no “here’s how many Rs are in strawberry”

But the models will learn about the famous examples of AI failures simply because they’re highly documented in the literature and training data, so it will know how to recognize that specific situation even if it doesn’t generalize that far to others.

System prompts and RLHF are two different things. The first is something fed to the model during inference before each answer; the second is the tuning that occurs before a model is even implemented.

But I also have zero doubt, per my original answer, that the intersection of model attention and the vectors for seahorse and emoji is more related than other emojis or other animals.

John Oliver did a story on AI chatbots this past weekend (YouTube link) that teems with examples of egregious info from various chatbots. Check it out, OP, you will not be disappointed. Possibly depressed and afraid, though.

The video is about 30 minutes and well worth your time if you have concerns about AI chatbots and the damage they have caused, are causing, and are capable of causing, and the extremely tepid response from the AI tech community to these problems.

Oliver sums up at the end:

All it seems they’ve really done is hand us a bunch of bots that are pedophiles, suicide enablers, and the occasional cartoon fox who just wants to watch the world burn.

Some day there’ll be public service announcements like the classic “I learned it from watching you” one. The internet had nastiness before we were teaching our AI from it. We need to teach them better.

I learned it by watching you! - Wikipedia!

I saw that show yesterday. I posted a response to it here in the link below. Bottom line: I appreciate Oliver’s investigative journalism, and in this case, while none of what he says is wrong, it appears so one-sided that it might almost be called “sensasionalist”. It’s not like his piece on the Sackler family and Oxycontin where there’s no mitigating circumstance to pure evil. The case for and against AI is far more nuanced and entirely centered around how the evolution of this very powerful technology is managed and regulated.

I have noticed that AI in general gives fairly good info if you ask it for something that is commonly known or has a lot of sources, but tends to very confidently give completely bogus answers if something is a lot less common. For example, if you ask it about Abraham Lincoln it will give you all kinds of good info, but ask it for a scripting function to do a particular thing in an older game’s mod tools and it will often give you the wrong function name or will make up something that is entirely wrong.

However, I did have Google’s AI give me a completely crap answer about a particular actress the other day. I wanted to know how many episodes she was in of a show I’m watching (because I don’t particularly care for her character’s plot line and wanted to know how much longer I would have to suffer through it). Google’s AI said she was in one episode of Season 2, which is technically correct, as it gave the first episode she was in, but misses that she is in several other episodes as well. But then it said that she featured prominently in a particular episode of Season 4 of the same show, which is a good trick for a show that only has 2 seasons so far.

I wouldn’t expect Google’s bullshit generator to fail on something that is that easy to look up.

One of Google’s non-AI links took me to IMDB where I was able to look up the real info.

I saw this somewhere else, but if you google ‘is the swimming pool on the Titanic still full?’, Google’s AI summary says:

No, the first-class swimming pool of the Titanic is not still filled with water. While often repeated as a myth, the pool likely drained during the sinking, and the massive water pressure at 12,500 feet, coupled with damage to the ship’s structure, makes it impossible for the pool to hold water.

Oliver’s general style is “sensationalist” - he’s not a shrinking violet and he goes hard after his targets. There’s no difference between his style on the Sackler family story and this story. His segments are well-researched and have a strong point of view.

This particular piece was on the dangers of AI chatbots (a.k.a the topic of this thread). It wasn’t a piece on the benefits of AI chatbots or about the trade-offs of using AI chatbots. He’s not obligated to give equal time to the benefits - that’s not the story he was reporting and there’s plenty of that AI hagiography out there in the world and on this board without him tackling it. Similarly, he was not obligated to spend a ton of time on the benefits of opioids in pain management - that’s not the story he was reporting.

Pretty interesting.

Meanwhile the Gemini app, even on “fast”, will “think” for half a second and then answer:

Technically, yes. The swimming pool on the RMS Titanic remains filled with water, though it is now seawater rather than the heated salt water originally pumped in from the Atlantic.

It really is like @SenorBeef said… the free default AI summary is producing crap results and giving a bad name to all the other LLMs, including Google’s own. It is really so sad that this is the default “face” of AI now for many people.

Edit: Though I suppose it still didn’t catch the similarity between “seawater” and “heated water pumped in from the Atlantic”

I actually think the “Titanic pool filled with water” is a result of a sort of linguistic ambiguity that no English speaker would think about. “Filled” meaning “containing” by whatever circumstance and “filled” meaning “full of things that someone deliberately put into them”

I’m fairly sure some language would have different words for these two concepts, and that native speakers of those other languages, since they think differently about the concepts, might not think it’s so ridiculous to distinguish between the two.

A rain basin is filled by rain and you fill your gasoline tank in a slightly different meaning of the word. I think in Russian and German you’d use two separate words for these concepts. Probably other languages too.

Those meanings are not worth distinguishing in this context by humans, but I wonder if the AI is tripping up by saying… the Titanic first class pools were filled with water, deliberately - water of a certain type, a certain salt percentage, maintained at a certain temperature, from a certain source - by whoever maintained the pool on the titanic. And now that filling - the original water - is long gone, replaced by ocean water.

I know that sounds intuitively to a human like a very silly mistake, but this is exactly where our intuition fails us, and sometimes AI thinks about something differently than we do. And in a way, it’s not even wrong. You can validly make a case that there are multiple interpretations of “filled” baked into the word and it’s just using an unintuitive one and humans just naturally patch over the difference so it seems bizarre when AI doesn’t.

That isn’t to say that this isn’t a behavior worth correcting, but it’s more understandable and less ridiculous than it might look at first glance.

Edit: Ha. I asked Claude to evaluate my theory and he said no, it’s probably simpler than that, that “It’s more likely that the model is pattern-matching on a cluster of associations — “Titanic pool,” “filled with water,” “still” — and pulling from training data that treats the Titanic as a wreck where nothing functions anymore.” So maybe I’m wrong, and I’m putting too much thought into this. Or maybe I’m right, and Claude is wrong about how this particular failure works. LLMs are so mysterious it’s genuinely impossible to know for sure. Brave new world.

Actually I just conducted a turbo deep dive with Claude on the issue of how language and culture shapes LLMs in the 20 minutes since I made that post, if anyone is interested. It’s certainly an interesting concept.

That’s a fascinating conversation and I encourage others to read it. I think Claude was providing many genuine insights.

I’ve explored the “language shapes thought, therefore an LLM’s training data predominant language shapes thought” in greater depth before but I somewhat recapped it there.

An interesting analogy from the other conversation - there are native aboriginals in Australia - I forget their name - that do not give relative directions like “left of” or “behind” – they always know their cardinal directions and always say that the tree is north of the rock, or I’m standing a few feet to your south. If you trained an image generator in their language (if you had enough training data), it would have no idea what to do if you tried to tell it “the tree is to the left of the rock” – it would be expecting “the tree is NORTH of the rock” and I guess you’d have to have a system to tell it where north is.

It’s a good example of how language shapes thought, and how that applies to both LLMs and humans.

Last week I spotted a thread in a foraging sub on Reddit, asking for ID on a plant that I instantly recognised as Hemlock Water-Dropwort (which is one of the most deadly poisonous plants in the UK). By the time I got to the thread, there was already a post there saying ‘Google Lens says this is Alexanders’ (Alexanders is an edible plant). I replied to properly identify the plant and warn that it was deadly poisonous.
While I had been typing that reply, the same person had added their own reply saying ‘or Angelica’ (another edible plant).
I replied to say I was sure it wasn’t Angelica and that I was very sure it was Hemlock Water Dropwort; before very long, someone else had joined in to say their app had identified it as Cow Parsley (another edible plant).

Three lethally-incorrect answers in the space of about 5 minutes.

Sounds like AI is helping us clean up the gene pool a bit.

I was discussing episodes of TV shows with Gemini pro last night. It does a surprisingly great analysis of whatever part I want to talk about it - it often makes the same points I was going to make before I do it, and these aren’t obvious points that everyone would make.

And what’s interesting is that I got an important detail wrong in an episode from memory. It addressed the rest of my prompt and then said “I have to give you a factual correction, what you said in this part did not happen in the episode”

I honestly thought it did and that gemini pro was making a mistake, so I said “are you sure? because I thought I remembered X and Y, and that’s why Z happened” and it said no, you’re incorrect, Y didn’t happen, and U and V are why Z happened"

So… I decided to fact check it. Went back and watched the relevant part of the episode. It was right, my memory was wrong.

Gemini Pro is in a completely different sort of sycophancy and epistemic tier than Gemini Flash / Fast. Gemini Flash would’ve 100% confirmed my story, and then even did some false and made up world building around me asserting that happened. It would’ve spun the story to make what I said seem true and examined the implications.

I actually think it’s important that we define at least two different things that we generally just call “sycophancy” – there’s the flattery, the pleasantness, the softening any pushback to the user. Saying “what a great question” or “oh you’re so smart to notice that!” - that’s a sort of flattery sycophancy that doesn’t actually necessarily mean the AI is generating anything false, it’s more like an interaction style.

And then there’s the epistemic sycophancy which is more like "never contradict the user, do not tell them when they’re wrong, in fact, make up a world that confirms the story they want to tell. If the user tells you they want to subscribe to X service but the cost is too high, make up a detailed and completely false tier of X service that costs half as much and say it’s a “lite” tier. When the user asks you about details, just… make up something plausible.

The latter is what happened to me with Gemini flash when I was trying to pick what google AI subscription to use. It heard that I wanted a privacy-oriented subscription tier not too much more expensive than AI pro and it invented one. And detailed it. And then made up bullshit to explain why it did that when I told it the subscription it described never existed.

These things aren’t unrelated and they’re hard to separate (there’s some overlap that involves both) but I think they are conceptually separable and it’s important that we do that. You could isolate one without the other. I could tell one of the LLM systems “be an asshole to me. never flatter me. in fact, tell me what you think I said is stupid” so that it never engaged in the sort of mostly harmless flattery but it still might eagerly confirm for me some wrong fact.

Conversely, I could probably tell it “Flatter me. Tell everything I say is brilliant. But never make any factual claims that you cannot verify through independent searches” and it would probably honor that, too - now it’s flattery with zero hallucinated facts.

Yeah, I’ve been thinking about that, the loose use of ‘sycophancy’. I think what LLMs do more often fall under what’s called ‘facework’, the conversational cues used to either protect yourself or others during conversations, or to reaffirm a conversation. It often comes as flattery, which I think why it’s easy to confuse with sycophancy.

With the chat apps that allow it, I’ll usually add a persistent prompt (custom instructions) setting like this:

Be concise and direct. Do not flatter or be sycophantic. Never praise the question or question asker. Do not prompt for follow-ups unless technically necessary. Always double-check factual questions with web searches and provide cites.

Really helps cuts down on the sycophancy, but I’m not sure how much it helps with the hallucinations. If I could force it to do a web search every single time instead of answering from its own training, I would, but that doesn’t seem like something that can be adjusted by way of instructions.

The tone of my outputs are so neutral now that it’s often off-putting to read someone else’s worshipful copy-and-paste. By default, they’re tuned to be drooling yes-men. I could do without any more “what a terrific insight” or “you’re absolutely right” for the rest of my life, thank you very much.

One time I set it to ultra critical and super harsh. That was fun, and good emotional training to get not get defensive. Reminded me of my dad :joy:

User prompts or simply telling the AI what you want in terms of tone, incentives, accuracy, etc. is a MASSIVE, MASSIVE tool shaping the quality of the output and hardly anyone uses it. It’s like everyone is half-assed using the training wheels children’s bicycle version when they could be tearing up a construction yard in a rad dirt bike if they put in some effort.