Today, I stumbled on an interesting paper claiming that ‘Hallucination is Inevitable’, i.e. that it’s not possible to completely eradicate the fabrication of false ‘facts’ in LLMs. Now, this is only on the arXiv so far, so hasn’t passed peer review, but their basic argument is surprisingly simple—essentially, they restrict themselves to a setting of formalizable ‘ground truth’ functions, then consider all possible LLM outputs, and employ a diagonalization argument to show that there are always ground truths that the LLM can’t perfectly match, i.e. where it produces false outputs.
If this is right, then there remains the question of practical relevance. The authors propose some pretty impactful limitations, e.g. that “without human control, LLMs cannot be used automatically in any safety-critical decision-making”, which would for instance forestall the project of using LLMs to make decisions in self-driving cars. But I’m not sure if that’s actually warranted by their result: what they establish is that for any LLM, there exists some ground truth on which it hallucinates, but that alone doesn’t give any indication on the frequency of hallucinations—a car that hallucinates once every thousand years would still be vastly safer than anything on the road today. I wonder if it’s actually possible, say by some technique that uses a formalized version of Berry’s paradox, to get some more quantitative result.
Otherwise, we might be in a situation similar to the one with Rice’s theorem: essentially, it’s impossible to decide what any given piece of code will do. Hence, debugging is, strictly speaking, impossible; nevertheless, many people do it every day. So one might wonder if we’re just going to get used to LLM hallucinations in the same way, if we can keep them infrequent enough to make them irrelevant for all practical purposes.
More interesting is perhaps the question of what this says regarding the difference between humans and LLMs. Do humans hallucinate (in the LLM-sense)? For an LLM, the difference between a hallucination and a genuine item of knowledge is utterly opaque: it will ‘believe’ one just as fervently as the other. Humans, obviously, are also often mistaken, and may be unaware of it—although we can also often associate a kind of epistemic confidence to our utterances, as when the phone-a-friend on Who Wants to Be a Millionaire? claims to be 80% certain of their answer. Is that an advantage we have over LLMs, or is that mistaken childhood memory or another confabulation just the same thing as an LLM hallucination?