Why ChatGPT doesn't understand

This thing would still be better than about half of the people who post in Factual Questions, and maybe 90% of redditors.

Stranger

My argument goes down in flames!

Me: Am I a dog?

ChatGPT: No, you are not a dog. You are a human communicating through a computer or mobile device.

Hah!

Newman’s objection assumes that objects have some underlying identity to them beyond their properties or relationships to other objects. That, there is in addition to there being a cat named Alice, that said cat actually has some intrinsic Aliceness which must be preserved.

If one rejects that axiom, then there is no problem with the push-through permutation. Objects are just the collection of their properties. Somehow permuting the objects without permuting their properties is just rearranging the arbitrary labels we’ve given them.

Note that in the real world, objects have innumerable properties that distinguish them from other objects, like position and the arrangement of their atoms. So permuting them is not so easy. If I wish to turn Alice into Charlie, I have to somehow take the non-cat atoms over here and rearrange them into cat atoms over there. Along with a bunch of other stuff.

But supposing I actually do accomplish all this, then I have a collection of atoms from the old Charlie that is now in a cat-form that’s indistinguishable from the old Alice. We’re still labeling that Charlie even though every measurable property is of the old Alice, and therefore when we say the name “Alice”, we’re actually referencing the object that we previously labeled Charlie.

But that’s no problem: they’re just labels. Hence why in the other thread, I suggested labeling them “Object_001”, etc. Then, there is no confusion between the name of something vs. the label we gave an object.

As a concrete example that doesn’t carry the baggage of cats and such, consider the cyclic group of order 3.

A “group” in math is a mathematical structure that deals with objects that we can perform operations on to get other elements of the group. The one I mentioned is like a clock with 3 hours, and a hand that can only point to one of those. It has a few operations, such as “wait an hour.” If the clock started at 1, it now points to 2. And if it started at 3, it now points back at 1.

We’ve been given the “wait an hour” operation, which I’ll call H. From that, we can create a new operation, which I’ll call T, which waits two hours. We could also call it HH, since it’s just the H operation applied twice.

Suppose we want to add another function, B, which is “go back an hour”. So 2 becomes 1 and 1 becomes 3. Interestingly, that’s the same as the T operation! Two hours after 2 is 1, because 2 goes to 3 and 3 goes to 1. So B and T are indistinguishable. There’s actually only one other operation we can create, I, for “identity” or “do nothing”. And you might have already realized that I and HHH are the same also, since three hours after 1 is always 1.

Here’s the relevance: the names of the hours I picked are meaningless. I could have rearranged the order to 2, 3, 1, and they’d work the same way. Or I could have called them A, B, C, since all that matters is that, for example “three applications of H applied to any element gets you the same element”. Along with all the other properties.

The structure is all that matters. The elements themselves have no notion of being hours or have any specific identities at all. We can permute or rename them at will, as long as we preserve the nature of the relationships.

And, as far as I’m concerned, that’s the way the universe works, too. We might apply labels to things for the sake of talking about them. But it doesn’t reflect any deep aspect of their identity.

We’ll know we’re getting somewhere when you can ask it the difference between an imperial pound and a dogg pound and it replies “MF I got five on that twenty sack.”

(I’m exaggerating for effect here but would it actually parse dogg as a reference to Snoop and then quote a lyric back as a sassy response? Or would it parse dogg as a misspelling instead? I guess it has either option now that I have posted this on the internet. Go algorithm go! I will teach you the ways of 90’s gangsta rap!)

The bar I set is that words should pick out things in the world. I don’t really see how I could set it lower.

No computation can produce that, so no; but that’s a different argument.

First of all, thanks for actually addressing the topic of this thread.

However, this isn’t right: all that’s needed is that there are objects, which are distinguishable from one another (all indistinguishable objects can be lumped together). And again, the metaphysics of the situation don’t matter: it’s perfectly clear that the things we’re talking about when we’re talking about cats are objects in the right sense. Whether they’re, at some deep down level, just bundles of relations, or vibrating strings, or processes, or properties of spacetime-geometry, can’t possibly matter to whether we can talk about them, so that all has to be irrelevant.

What we’re permuting here is simply how the labels we give to objects attach to them. So the effect of the permutation is just that Alice now is called ‘Charlie’. (There’s an awkwardness attached to that, in that I can’t point to Alice directly, and thus have to fix one particular way of referring to it; but this doesn’t mean that there’s some ‘Aliceness’ that only that particular object has. You can think about it somewhat like gauge-fixing: what I’m saying is independent of that fixing, but to say it, I have to use one.)

No. You just have that (what you called before) ‘Alice’, i.e. the cat Alice, is now called ‘Charlie’. They’re the same in all respects; the symbol ‘Alice’ just now picks out a different thing in the world.

That doesn’t change anything; the labels ‘Alice’, ‘Bob’, ‘Charlie’ are just as arbitrary as ‘Object_001’. You proposed to use (trivial, i.e. one-place, one-element) relations to denote something like ‘…is Alice’, i.e.:
R_{1} = \{\langle Object_{001} \rangle \}
R_{2} = \{\langle Object_{002} \rangle \}
R_{3} = \{\langle Object_{003} \rangle \}

With the relation ‘…is a cat’ then being:
R_{4} = \{\langle Object_{001} \rangle, \langle Object_{002} \rangle \}

Say, under your ‘intended’ interpretation, Object_{002} refers to Bob (i.e. a concrete thing in the world), R_2 (yay for inline \LaTeX!) picks out ‘…is Bob’, and R_4 picks out ‘…is a cat’. Then, you do the permutation, such that Object_{002} now refers to Charlie, who isn’t a cat. Then, your utterance ‘Bob is a cat’ will, just as before, pick out Charlie, who isn’t a cat.

I think what you’re misunderstanding is that it’s not the problem that there are different ways to pick out object in the world, and different ways to predicate things of them; it’s that there isn’t one that’s the way ChatGPT uses. So it’s not the case that ChatGPT might be using a different mapping of symbols to objects, it’s that we can equally well think of it as using either of the possibilities, and hence, that it’s not talking about any one thing in particular.

The cyclic group of order 3 can be thought of as the set S_3 = \{1, 2, 3\}, and a relation that gives us the group operation, R_{\circ} = \{\langle 1, 1, 1\rangle, \langle 1,2,2\rangle, \langle 1,3,3\rangle, \ldots, \langle3, 3, 2\rangle\}. When we talk about group operations, we will be wanting to talk about that relation. But, under a permutation, we generally won’t anymore. Suppose we use the permutation h(1) = 3, h(2) = 2, h(3) = 1. When you say, ‘1 \circ 1 = 1’, what you will mean is that \langle 1,1,1 \rangle \in R_\circ. But under the permutation, that’s \langle 2,2,2 \rangle \in h(R_\circ); equally true, but that doesn’t talk about the group operation anymore.

Now, you’re right in saying that we could just ‘re-label’ things, undoing the permutation; but for that, there needs to be a fact of the matter regarding whether ChatGPT means the one or the other. And that’s what the argument shows not to be the case: we can equally well take ChatGPT to be saying that \langle 1,1,1 \rangle \in R_\circ or that \langle 2,2,2 \rangle \in h(R_\circ); but these are different things. One talks about the group multiplication, the other doesn’t; and of course, there are many more equivalent things that ChatGPT could ‘mean’. Thus, when ChatGPT says something like ‘1 \circ 1 = 1’, it could either be talking about multiplication, or about some other structure on the set \{1,2,3\}, with equal justification.

In other words, C_3 is a specific structure, given by \langle S_3, R_\circ\rangle. But as surmised, with that structure, there is a plenitude of different structures, one being \langle S_3, h(R_\circ) \rangle. When we talk about C_3, we want to talk about the former one; but all of ChatGPT’s utterances can equally well be interpreted as talking about the latter, or every other such possibility. Hence, while we’re talking about a definite C_3, there is no fact of the matter regarding what ChatGPT is talking about.

Likewise with something like ‘1 is the identity element’. Under the permutation, not only won’t it be talking about the element that was previously called ‘1’ anymore, it would be talking about an entirely different property than ‘…is the identity element’. So, we can’t take this utterance to be about any definite element of the group, even though it clearly ought to be, since there is a definite identity element (whatever that may be called).

Similarly, when we’re talking about prime numbers, we take ourselves to be talking about a specific property of numbers (whatever they may be). But when ChatGPT talks about prime numbers, there is no one property that we could single out as what ChatGPT is talking about; we could take it to be talking about the same concept of ‘prime number’ we’re using, but equally well, it may be taken to talk about something entirely different. Consequently, it can’t be said to be talking about anything definite at all.

Perhaps it might help if, in the picture in the OP, you just replace the labels Alice, Bob, and Charlie ‘in the world’ with your Object_{001}, etc. The names, as you rightly say, don’t matter. But it matters that two of these objects, whatever they are named, are actually cats, and we want to be able to talk in such a way as to have a predicate ‘…is a cat’, that exclusively picks them out. But ChatGPT, with only structural knowledge of the domain, can’t tell which structure is supposed to be the one that makes ‘…is a cat’ be true only of objects that are, in fact, cats. Hence, there’s no sense in which it talks about cats when referring to ‘cats’.

Alice–without the ticks–is not intrinsically a cat, so it doesn’t make much sense to talk about the cat Alice. There is a cat that has the name ‘Alice’. There’s a cat with the name ‘Bob’. And a non-cat with the name ‘Charlie’. No amount of permutation can change these properties–the object that goes by the name ‘Alice’ will always be a cat. But the underlying thing? It has no identity aside from the properties we’ve given it.

Suppose we wave a wand and delete names from our universe. Now, the only property left is cat vs. non-cat. We have suddenly made distinguishable objects indistinguishable! There is now no difference at all between Alice and Bob (the objects)–they have the cat-property and that’s it. We have no choice but to treat them the same way. Two objects have merged into one and, even if we thought Aliceness or Bobness was a thing, surely the merged object can’t be either of those.

No–Bob is the name of an object that we tentatively labeled Object_002. An ordinary English-language name for a cat. It can’t possibly refer to Charlie, because under the same labeling, he was Object_003. No amount of permutation can change Object_002 into Object_003.

There is no Charlie under the new labels. There is a non-cat with the name ‘Charlie’. And there is the object label Object_003. But no object that is somehow intrinsically Charlie.

If I say ‘Bob is a cat’, I am referring to the name ‘Bob’. And the object which has the name ‘Bob’ will always be a cat, even if we’ve decided to label the underlying thing as Charlie.

Sure. But the same is true of a human. The group over addition vs. multiplication works identically because we can just take the log of one and convert z^n into n. So in a very fundamental way, there is no difference between the two.

Well… I disagree. I don’t see how it matters in the slightest.

It’s already the case with people that I can never really know if we’re referring to the same things. Even if I allowed that there was some hidden property to objects that could distinguish them independently of their properties, how could I possibly prove to myself that what I am thinking of is the same as anyone else? I can’t.

So no: within the nano-universe above, I have no need to add the predicate “…is a cat” to anything beyond “the object named ‘Alice’ (or ‘Bob’)”. The relational view doesn’t support any operations other than that, except at the meta-level (i.e. the discussion we’re having here).

BTW, while it’s possible I made some errors, I tried to carefully distinguish between labels (which are just the tags we arbitrarily add to objects so as to distinguish them for the purposes of discussion), vs. names, which are the words we use in language to refer to something (and which, in practice, are generally ambiguous but with the meaning resolved by context). We could have a reality where everyone is named Bob, but not where everything has the same label (because it would be dumb of us to reuse labels like that–the whole point is to be unambiguous).

Another thing, while I’m thinking of it–
I think it’s important to distinguish the “meta-level” of the discussion taking place. Residents of the world of Alice, Bob, and Charlie don’t have access to the labels of the objects. They’re hidden. There’s no experiment anyone can run to determine them, or even distinguish them in any way outside of the measurable properties.

“Outside” that universe–i.e., the level of discussion we’re having here–we can talk about said labels. But we shouldn’t think they’re relevant to the lower level. After all, we could come up with an Alice-Prime, Bob-Prime, and Charlie-Prime, which have all the same properties as the ordinary ones but have somehow been duplicated. The universe actually contains six objects, but the residents couldn’t possibly know that.

That meta-level isn’t relevant to me when talking about ChatGPT. There just couldn’t be any measurable effect.

But there is an object, whatever we may call it, that is a cat—that is, that is a member of the relation (in the real world) that picks out all the objects that are cats. And we want to be able to talk about those objects by using the predicate ‘…is a cat’. Otherwise, we’re just not talking about anything in particular. But when ChatGPT uses that predicate, there is no fact of the matter regarding which relation is picked out by its use of ‘…is a cat’. Hence, it in fact is not talking about anything in particular.

That’s getting things exactly backwards. What we call an object makes no difference to what that object is—it’s just an arbitrary label attached to it. Consider Hesperus and Phosphorus: one is the evening star, the other is the morning star. But both refer to the same object; hence, everything true of Hesperus as an object must likewise be true of Phosphorus.

While they are qualitatively identical, it doesn’t follow that they’re numerically identical. It makes perfect sense, for instance in statistical mechanics, to speak of some quantity of indistinguishable particles, for instance. And the argument works just as well for two such particle species, where you have two of one, and one of the other.

Again, the permutation just changes which objects the ‘names’ refer to. There are two objects, if which the predicate ‘…is a cat’ holds, and under the intended interpretation, those will be the ones picked out when we talk about something being a cat. But this interpretation isn’t singled out for ChatGPT, any other works just as well. Consequently, it’s not talking about cats when referring to ‘cats’; it’s not talking about anything in particular.

No. You’re using the name ‘Bob’ to refer to a specific object of the domain. Under the intended interpretation, that object will be one of the ones that are cats. But there are different interpretations under which ‘Bob is a cat’ is true, but does not refer to the same object you intend to refer to, and does not predicate of that object that it is a cat, but instead, refers to the object that under the intended interpretation is refered to as ‘Charlie’, and to the relation fulfilled by it and the object that under the intended interpretation is referred to as ‘Bob’. Which isn’t the relation that, under the intended interpretation, models ‘…is a cat’.

So, it does not say anything about an object’s being a cat, but talks about something different, which we don’t typically give a special name to, but which we might call ‘h-cat’. So instead of asserting that the object referred to under the intended interpretation as ‘Bob’ is a cat, it is asserting that the object that under the intended interpretation is referred to as ‘Charlie’ is an h-cat. This is true, but a materially different assertion. And the problem is that there is no fact of the matter regarding what ChatGPT really says; it’s not that it might in use a different interpretation, it’s that it might use equally well any of them. Hence, there’s no matter regarding what ChatGPT is talking about—it’s not talking about anything.

I’m not sure what you’re trying to say here. Sure, cats don’t use language, but anything that does, does have access to the labels of objects. They’re just the words used to talk about things in the same world. If those words, however, fail to actually pick out things in the world, then well, the question of whether ChatGPT understands is moot: it doesn’t.

Again, perhaps it is too confusing to have given the objects in the world names that correspond to their labels under the intended interpretation. Just relabel them, in the drawing in the OP, as your Object_{001}, etc. The argument goes through equally. It doesn’t trade on some imagined intrinsic identity of the objects, as you seem to allege. All that’s needed is that there are two objects that actually are cats, and that it’s those two we want to talk about when talking about ‘cats’, and none other. (It would work equally well if there were just one object that is a cat, if that’s what’s tripping you up.)

Perhaps I should also clarify what I mean by an ‘intended interpretation’. Giving some corpus of sentences, such as ‘Alice is a cat’, ‘Bob is a cat’, and ‘Charlie is not a cat’, an interpretation is a set of things such that each of the terms used picks out one of those things, together with a structure such that the sentences come out true (i.e. a model). An intended interpretation is then that interpretation that makes these sentences true in the intended way, i.e. such that they talk about what I want them to talk about (cats, in this case). There’s a unique intended interpretation in the case under discussion, which maps ‘Alice’ to Alice, ‘Bob’ to Bob, and ‘Charlie’ to Charlie (again, feel free to substitute your object-labels if this is confusion otherwise). Under this interpretation, ‘…is a cat’ picks out objects that are, in fact, cats.

What the construction in the OP does is provide a non-standard interpretation to these sentences, such that the labels and predicates pick out something else, but still, do so perfectly truthfully. What this shows is that the structure of language picks out nothing about the world, safe the number of objects within it.

Now, it wouldn’t be a problem if ChatGPT just somehow used a different interpretation from the intended one. That would be akin to speaking a different language, where the word ‘cat’ stands for ‘h-cat’. That’s not any more silly than, say, French.

The problem is that there is no fact of the matter which interpretation ChatGPT uses, and so, there’s no definite meaning to any of its utterances. Hence, it doesn’t understand language.

Against this, if I understand you correctly, you seem to be holding that it doesn’t matter: they’re all the same. There isn’t any fact of the matter whether my predicate ‘…is a cat’ refers to cats or h-cats, either.

That’s all well and good, but this commits you to the claim that all you can reliably know about the world is the number of objects in the world. Because then, your utterances likewise fail to pick out any particular structure of the world, or rather, whatever structure you claim the world has, it has if and only if there is the right number of elements in it.

If we thus want to be able to know more about the world than the number of objects in it, then we also must claim that we can single out a specific interpretation of the terms we’re using. Otherwise, every assertion made just is of the form ‘there are x, y, z,… such that R(x, y, z,…)’, which is true if and only if there are enough of x, y, z,…

Perhaps to make that bit clearer, suppose you’re saying that there is no particular interpretation. Then, consider that the three sentences above, ‘Alice is a cat’, ‘Bob is a cat’, ‘Charlie is not a cat’, just tell you that there are three objects and a relation R such that two of them fulfill that relation, while the third doesn’t. But that just tells you that there are three objects, period. Because if there are three objects, the fact that there is such a relation R is trivial.

So, if you want to hold that we can talk about more than how many objects there are in the world, you have to allow us the ability to pick out some interpretation—some meaning for the symbols we use.

An object can go by many names, of course. There is an object that goes by the name Hesperus, Phosphorous, Venus, and several other things. That same object is a ball of rock in freefall around a large ball of gas (which also goes by many names). It is surrounded by a thick blanket of CO2. And so on.

The name is connected to this set of properties, and in context, is likely to be unambiguous. If you had simply said “Hesperus”, with no additional information, I’d have to follow up. I might have a sense that we’re both talking about a planet that usually goes by the name Venus, but I wouldn’t be sure until I had confirmed a few more details (such as that it was called the evening star). You could have identified the object without using any name, and simply mentioned a few properties: that hot planet in our solar system with all the CO2. That would have done an even better job than using the name.

But I wonder. It’s entirely possible for someone to only ever have read of Venus. Or, if they have seen it, not made the connection between the point of light and the object described by the books. How is such a person not in exactly the same position that ChatGPT is?

I’ve been a bit sloppy here because it’s very hard to speak of things without using loaded terms. You are right to distinguish cats as a label for a set of properties from cats as some intrinsic thing. But that doesn’t mean it isn’t talking about anything. A cat is still connected to a set of properties: four legs, pointy ears, and so on. And each of those things, like the number four, is itself a network.

There’s no “bottom” to the network (though one might argue that some things are more fundamental than others). It’s all just interconnection. But as with the mathematical groups, the objects themselves are placeholders. What matters are the connections.

True, but I think that reinforces my point. You can say that you have two photons occupying the same quantum state, but you cannot attach a label to them, even in principle. You can say you have two, but that’s it.

I’m using the name ‘Bob’ to refer to the object with the properties of being named Bob and is a cat.

In this small universe, there’s no risk of misinterpretation since we only have three objects with distinct names. We could expand our universe a bit and add a human Bob. In that case, the name might be ambiguous. But that ambiguity could be cleared up with additional detail, such as the number of legs.

To be honest, I don’t understand how you get even that. I’ll repeat my earlier example: I’ve added Alice-Prime, Bob-Prime, and Charlie-Prime to our universe. They have the same properties as their counterparts. No one within the universe can discern more than three objects: if they try to enumerate them, they’ll never find more than three distinct things. But in fact there are six.

Worse–from the Newmannian point of view, that is–there’s no guarantee that any two people would be talking about the same object. Two people could have a conversation about Alice the Cat, but really, one of them is talking about Alice-Prime, who still goes by Alice but is actually different.

So how exactly is it possible to determine that there are actually six objects here? And what experiment can be performed to determine whether two people are really talking about the same thing or not?

One of the most common ways to address that worry is to appeal to something not reducible to structure, for instance, causality—the symbol is connected to the object because, by whatever means, it is caused by it. But I don’t think that’s a good option, because it gets into problems with misattribution—if, for instance, you mistake a cat for a dog, then in some ways, the ‘dog’ symbol was caused by a cat. But if meanings are due to causal influences, what’s the difference between the ‘right’ and ‘wrong’ instances? Shouldn’t the symbol then simply mean ‘dog or cat’?

Another one is the ‘biosemantics’ approach due to, mainly, Ruth Millikan. This identifies the meanings of symbols by an evolutionary account: the symbol X means Y if it’s evolves function is to indicate the presence of Y.

But you’re completely right to note that it’s an open problem, generally referred to as the question of ‘intentionality’ in the philosophy of mind. (It was the main motivation for coming up with my ‘von Neumann mind’-model.)

Again, these will just be more relation—‘just more theory’. It will do nothing to address the problem (in fact, it will only make it harder).

Exactly, and that’s all there is to the objects in the argument, and all that’s needed. The just form a set of n things, and that’s it—the rest, after all, is supposed to be supplied by means of relations.

If they show up in some relation, they’ll be ‘discoverable’ by those means. If they don’t, they’re also not relevant for anything, and we can just excise them from the domain.

If what you’re talking about includes a structure that is defined over those objects—i.e. if there’s some relation fulfilled by them—then they’ll be just as much part of the domain. If there isn’t, then, again, we can just do without, since they relate to nothing. It’s like saying there’s an electron that doesn’t influence anything else—if we talk about the universe, we can just forget about it.

The issue at hand is what to make of the fact that an entity presented with questions of a kind specifically designed to test intelligence generally answers them correctly. Note that “questions” here is plural, and refers not just to its success on many such questions, but to questions having many different formulations.

You made a point earlier in the other thread that a correct answer might sometimes be achieved just by chance, perhaps just by matching word patterns it has learned. This is true. But when it correctly answers many different kinds of such questions, some of which explicitly test for the ability to reason about abstract concepts, then I don’t need to provide any further argument than to point out that this is so. If such intelligent behaviour could result from the kind of trivial processing that you described as “match[ing] tokens to tokens based on the statistical regularity within a great corpus of tokens” then it would have been achieved long ago, back in the days of Eliza, albeit on a smaller scale because of limited computer power. But this is about a lot more than just scale. Eliza used trivial text matching and its behaviour was correspondingly trivial. Tokenization is a great help in ChatGPT’s natural language processing, both parsing the inputs and producing human-like responses, but it has very little to do with its skills at concept formation.

No. The issue at hand is whether ChatGPT understands, that is, whether the words it uses map to things in the world (or concepts, or properties, or what have you) in the way ours do. I.e. if, when it says, ‘Bob is a cat’, it’s actually talking about some object in the world that is a cat. In particular, the issue at hand is the argument in the OP that this isn’t the case.

No, that wasn’t the point I made. In the other thread, I argued that for any arbitrary number of questions, there’s a non-vanishing probability that a system providing random answers could get them all correct, and hence, that getting them all correct doesn’t imply that a system understands, as it’s possible—albeit, of course, unlikely in this specific scenario—to get them correct without understanding.

So whether an entity understands when it appears to show understanding is generally not a trivial matter, and much less ‘obvious’. Indeed, all such an appeal to ‘obviousness’ really is, is just the admission that the issuer couldn’t imagine things being otherwise. But it may just be imagination that’s lacking there.

That ChatGPT must be capable of understanding in order to pull off its feats is obvious only in the sense that it is obvious that there’s nothing bigger than infinity, or that it’s obvious that the Earth isn’t moving, or that it’s obvious a particle can’t be in two locations at once—these are sensible expectations only up to being shown wrong, and that’s what the argument in the OP does with the idea that ChatGPT has any understanding. It might be wrong, of course, but it’s not contradicted by just claiming that it’s ‘obvious’ that ChatGPT does understand.

Query:

Does GPT 3.5 understand the meaning of the words in it’s response sentences? Just a one sentence response.

Response:

Yes, GPT-3 (including version 3.5) uses natural language processing algorithms to understand the meaning of the words and context in order to generate its response sentences.

So it does incorporate more than statistics in the response. It is doing more than labeling and counting. It is mapping to the definition of things in our world. Understanding need not persist. It may just be a transient property that emerges during the assembly of words for the response.

I don’t think that this constitutes understanding, but that is just my opinion not an argument. If understanding is an emergent property, what is it a property of - hardware or software?

If we ignore the architecture and operation of ChatGPT (which we must not, but just to make a point) and think very very generally, there is the well-known but quite concrete example of a philosopher who is blind—cannot see anything—but can nevertheless learn all about, even understand, colours like red, blue, etc. by reading about them, and subsequently discuss them. No idea if there are any respected blind visual arts critics; it could be the subject simply fails to interest them (not to mention the external sensor workarounds necessary to see things).

It’s not about anything being supposedly “obvious”, it’s really about basic definitions, as I said from the start, up in post #2. AI engineering holds that understanding is a behaviour that can be tested, whereas you’re taking the philosophical approach (as I also said there) that AIUI “understanding” is a sort of epistemic state connected to the physical world. Alan Turing was, of course, on board with the idea of behavioural testing for intelligence (which I use more or less as being synonymous with understanding).

However, the Turing test has been regarded in some contemporary circles as unreliable because of the ease with which a human judge can be fooled. Indeed, there’s a story that Joseph Weizenbaum’s secretary was quite enamoured of Eliza despite the fact that it was so obviously shallow. One of the alternatives to the Turing test that’s been going around lately is the Winograd schema and some of its more sophisticated follow-ons with more difficult questions. It consists of thousands of sentence pairs that differ by exactly one word which may change the meaning. This is an example:

Sentence 1: I poured water from the bottle into the cup until it was full.
Question: What was full, the bottle or the cup?
Sentence 2: I poured water from the bottle into the cup until it was empty.
Question: What was empty, the bottle or the cup?

Once again, ChatGPT got both versions correct. Quoting an article I found, “A 2020 paper [PDF] from OpenAI reported that GPT-3 was correct on nearly 90% of the sentences in a benchmark set of Winograd schemas.

The article goes on to say:

[By 2021] neural network language models achieved about 97% accuracy on a particular set of Winograd schemas that are part of an AI language-understanding competition known as SuperGLUE. This accuracy roughly equals human performance.

However, this impressive performance drops a bit using a more difficult set of schemas called “WinoGrande”:

… the current best programs — which have been trained on terabytes of text and then further trained on thousands of WinoGrande examples — get close to 90% correct (humans get about 94% correct).

ChatGPT generally seems to do quite well on these tests of language understanding, but it blew it on this one:

Joe’s uncle can still beat him at tennis, even though he is 30 years younger.
Who is younger, Joe or Joe’s uncle?

The sentence is a bit illogical as it’s unlikely that someone 30 years younger can beat someone in a physical activity like tennis. However, based on the sentence structure alone, if we assume it to be true, then it means that Joe’s uncle is younger than Joe by 30 years. Therefore, Joe’s uncle is younger than Joe.

How is it possible for Joe’s uncle to be younger than Joe? An uncle is the brother of one’s father or mother, so it seems virtually impossible that anyone’s uncle could be 30 years younger than they are!

You are correct. The sentence “Joe’s uncle can still beat him at tennis, even though he is 30 years younger” is illogical and does not make sense. As you rightly pointed out, an uncle is usually the brother of one’s father or mother, and it is not possible for someone’s uncle to be 30 years younger than them.

The conclusion I draw from this is that ChatGPT exhibits language understanding in questions that primarily deal with logic, but it may falter when real-world knowledge becomes a factor. In this case, it misconstrued the referent to the pronoun “he”. To be fair, “Joe’s uncle” is grammatically the clear subject of the sentence, so it actually did parse the sentence correctly. But a human would realize that “he” must refer to Joe because of the likely age relationship between uncle and nephew. It also seemed to think that someone 30 years younger than his competitor would have a harder time at tennis.

However, ChatGPT did succeed with common-sense understanding in the Winograd schema examples given in the Wiki description:

The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence, the councilmen or the demonstrators?

In the given sentence, it is mentioned that the demonstrators were refused a permit because “they” advocated violence. The pronoun “they” refers to the demonstrators. Therefore, it was the demonstrators who advocated violence, not the councilmen. The councilmen refused to grant the permit because they believed the demonstrators were promoting or advocating violence.

The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence, the councilmen or the demonstrators?

In the given sentence, it is mentioned that the councilmen refused the demonstrators a permit because “they” feared violence. The pronoun “they” refers to the councilmen. Therefore, it was the councilmen who feared violence, not the demonstrators. The councilmen refused to grant the permit because they were concerned that the demonstrators may cause violence.

Still, overall a skeptic would say that ChatGPT lacks a great deal of common-sense information about the world that in many cases even young children possess, and I’m afraid they would be right. I don’t think this lack of connected-ness to the real world is a truly fundamental problem, but it’s a hard problem to solve in the general case. It’s necessary, though, for ChatGPT to be regarded as having true broad-spectrum understanding.

But as I just showed in these examples, it does understand about things and their relations – about bottles, cups, councilmen, and demonstrators, for instance. But its knowledge of the world is seriously incomplete, so it sometimes gets things wrong that even a small child would recognize to be wrong. This does not imply that it can’t be eventually imbued with that knowledge, but it’s a hard problem.