The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

That’s beside the point. In an eternal universe, or a branching multiverse, or any other sort where every possibility eventually comes to pass, there will be instances of the behavioral test administered to this sort of machine that will get 100% matching answers, and the machine will, by behavioral criteria, consequently be judged to understand. But more importantly, by simple knowledge of its internal workings, we can immediately conclude that it doesn’t. Hence, the question admits of an answer by means of non-behavioral considerations, which is true, and an answer by behavioral ones, which is (in some instances, at least) false. Consequently, behavioral considerations don’t faithfully track the question of understanding in a machine.

Oh? So now whether a machine understands is dependent on the capabilities of the interlocutor? Who decides what minimum criteria they must fulfill?

This isn’t just idle sophistry. Suppose that what has been contended here—that we’re possibly of the same kind, when it comes to understanding, as ChatGPT—is actually true. Then, the behavioral test would tell us nothing at all about whether ChatGPT understands. You may have, at some point, unleashed two chatbots on one another, for the amusement of having them descend into sheer conversational anarchy. Or you may have tuned into ‘Nothing, Forever’ before it was taken offline.

From our perspective, it’s immediate that the partners in such a conversation aren’t making sense. But the chatbots themselves will blather on, blissfully unaware anything’s amiss; so any of those chatbots administering a behavioral test for ‘understanding’ on the other, would cheerfully conclude that yes, the other system must, in fact, understand to match them conversationally so well—because they can’t leave their own perspective, and it’s in fact just their own limited capacities that make it appear so.

However, if we humans are now in the same boat, then how do we know whether our tests for understanding aren’t just as buggered? If we can’t exclude the hypothesis that, ultimately, we’re just ChatGPT, there’s no way to claim that we could faithfully assess the understanding within a chat engine by means of a behavioral test. But if we can exclude this, by means of identifying a difference in understanding in us and such an engine, then we can also conclude that a behavioral test is insufficient! So in either case, the behavioral test tells us nothing.

Even if that paper is right, that’s not correct. All that’s argued for there is that you ought to associate the same mental capacities to a lookup table that you’d associate to a full-scale AI. So if that’s right, then this just entails a reductio ad absurdum of the idea of computationalism: since it’s easy to see that the lookup table has no mental states, so must the ‘compressed’ version.

But the paper falls short of its intended goal. For consider what the lookup table program yields: an automaton capable of holding a conversation for some limited time (say, half an hour). The compressed version of the program, then, would similarly be capable of that. So now, there’s the answer to the argument: of course we shouldn’t attribute any mental states to a program capable of producing only half an hour of conversation, as with any truly intelligent being—barring unfortunate happenstance—we could always extend any conversation, beyond a given limit. That capacity is lacking in the ‘compressed’ lookup table; hence, there is no reason to attribute mental states to it.

It’s the same problem as with a ‘behavioral’ test for randomness generation: a computer, perhaps via a great big lookup table, could produce randomness for any given finite stretch. But a source of real randomness can produce it indefinitely. Hence, there is no reason to attribute randomness-generating capacities to a system generating only finite amounts of randomness. Likewise, there is no reason to attribute understanding to a system capable of only producing finite stretches of ‘understanding’-behavior.

Any other conclusion would be thoroughly bizarre—akin to arguing that, from just piling up enough rocks, they just might spontaneously levitate and recite Coleridge’s Kubla Kahn. Emergent behavior has its roots in the elements it emerges from: the flight patterns of a swarm follow from simple rules every bird enacts; the wetness of water from properties of hydrogen bonds. But clearly, a single key-value pair in a lookup table has no fraction of understanding whatsoever. And the concatenation of such pairs does not lead to any interaction between them; they can just be regarded as so many individual pairs, without any communication, any grounds on which to build higher logical structures. To expect mental states to emerge from this strikes me as akin to belief in magic.