The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

Yeah, something like that was exactly what I was concerned about in my previous query about it spreading malware and the like.

Today historian Bret Devereaux talked about Chat GPT on his blog page, and of the limitations he sees on Chat AI in its current form. Collections: On ChatGPT – A Collection of Unmitigated Pedantry :

The tricky part is that ChatGPT and chatbots like it are designed to make use of a very influential human cognitive bias that we all have: the tendency to view things which are not people as people or at least as being like people.

It is crucial to note, however, what the data that is being collected and refined in the training system here: it is purely information about how words appear in relation to each other. That is, how often words occur together, how closely, in what relative positions and so on. It is not, as we do, storing definitions or associations between those words and their real world referents

How does he know this is how humans operate? I’d suggest that the phenomenon of eggcorns demonstrates that people often do not operate on definitions or real-world referents. People write “with baited breath” not because they think people’s breath smells like worms, but because that’s just how they always heard other people say things.

People seem to write a lot about how they think human cognition works without a lot of evidence that it actually works that way.

I lost a rap battle with Chat GPT. It said my rhymes were weak and my bars were basic. That’s true and all, but I am surprised Chat GPT detected that and then preceded to diss me without mercy in it’s strangely PG way.

It would probably say that to anyone, because that’s the sort of thing you say in a rap battle (even, or perhaps especially, if it’s not true).

A commendable observation about an important truth. The people who know the most about how human cognition works, such as cognitive scientists, understand this frustrating truth, and are also acutely aware of how much we still don’t know about human cognition, and how many competing hypotheses there are at the limits of our current understanding. But some sizable fraction of the rest of us will often confidently opine on exactly how it all must work.

It’s no surprise, I suppose. We all think we have insight into our own thought processes, and it all seems very convincing.

But our brains lie to us constantly1, starting with our very memories. They’re unreliable, and our sense of confidence in them has no relationship with their accuracy. Some of our most convincing memories are also the most highly fabricated.

The same is true of other processes, like how we decide between two alternatives. It just doesn’t work the way we feel it works, using evidence and rationality to pick between them.

Are humans capable of using definitions of words to produce text? Sure, at least some of the time–though even then, I sometimes find myself able to use words that I can’t quite define. But in any case, it doesn’t mean that’s the primary way that humans communicate.

[1] Of course this is a strange way of putting it, making it sound like some kind of dualism. Perhaps it’s more accurate to say that different parts of the brain lie to each other.

In my former life, I went through some classes in AI dev and my perception at the time was:
-Okay, you program a bunch of flowchart decision analysis questions that end in a final answer. Imo this technique has been consolidated into a AI programs. -Warts and all.
The learning part is the most worrying piece. We can barely even keep up with zero day exploits. Imagine a system pushing back on every change you try to implement.
I’m not a fan of AI and I’m pissed at Microsoft pushing it on Bing users.

That was the so-called “expert system” approach, which turned out to be totally useless. I woud say that every previous approach that used a rigid decision tree–symbolic natural language processing being another one–was a total failure. Language, knowledge, decisions, etc. just don’t work in such a rigid fashion, nor could you build a system even if they did. Unsupervised learning is the only thing that works.

Interesting link, thanks.

IDN. I just took some classes in AI function and theory. Enough to know it might be unwise to hand the keys of the kingdom to an unknown entity…

There seem to be lots of reasons to think so. For one, if I tell you, ‘imagine a blue dog jumping in pink grass’, if you don’t happen to suffer from aphantasia, you will be able to produce an appropriate mental image, even if you’ve never seen either a blue dog or pink grass befors. So, words are symbols, referring, at a minimum, to mental images, and capable to combine to form entirely new ones. Additionally, we sometimes struggle to put our thoughts into words: thus, whatever thoughts are, they are not words (the preceding would be incoherent otherwise), and words are used to express them.

Moreover, human language acquisition works very differently from that of a LLM. No infant is exposed to anything like a large enough corpus of speech to abstract statistical rules of word-usage from it. Rather, parents point at things and say their names; words are correlated with things and actions. Indeed, language bootstrapping (i. e. the origin of language at some time far in the past) seems impossible without words playing a representational role: think about how you could manage to learn a language with only a dictionary in which every word is explained in the language—you could never find out what actually means what, if the chain of reference didn’t bottom out in something you already understood, ultimately in the real world.

That said, not every utterance’s meaning cleanly separates into that of its constituents. If I tell you, ‘it’s raining cats and dogs’, the mental image you generate won’t be one of pets falling from the sky. You understand this phrase not from its constituents, but as a whole: it’s an element of what’s sometimes called the ‘phrasal lexicon’. Furthermore, not all utterances necessarily have the intent of conveying representational content—some things you say because they’re the politely expected things to say, for instance.

Somebody saying something like ‘with baited breath’ may still have a concrete meaning for that expression in mind; it’s just that their meaning is the same as yours for ‘with bated breath’: they just use the same symbols differently, like a German speaker will use the symbol ‘gift’ (i. e. ‘poison’) differently from an English one. Or, they may use it idiomatically, as an element of their phrasal lexicon; or indeed, just as an uninterpreted utterance particular to a certain situation. But crucially, even if there are instances of the latter in everyday speech, not all are—otherwise, it wouldn’t really make sense, for instance, to argue on the internet. :wink:

I just read a post by a certain Ted Gioia (is he someone who should be known in the US?) about strange things happening with chat AI:

Bing’s chatbot, it seems, has called a user a waste of time, another has been classified as an enemy, another was told his wife does not love him and his marriage is no good, just to cite some examples. This is not how this should be developing. At what point does the government have a duty to step in? Or is Ted Gioia an alarmist fool?

I plugged this into ChatGPT:

Is there anything wrong with the phrase “a blue dog jumping in pink grass”?

It replied:

There is nothing inherently wrong with the phrase “a blue dog jumping in pink grass.” However, it may be helpful to consider the context in which the phrase is being used.

If the phrase is being used as a descriptive sentence in a creative writing piece, then it could be an effective way to paint a vivid picture in the reader’s mind. However, if the phrase is being used in a different context, such as in a scientific report or a news article, it may be less appropriate or confusing.

Additionally, the phrase may not accurately represent reality, as dogs are not typically blue and grass is not typically pink. However, this could be seen as a creative choice rather than a factual error.

Well, clearly I have no idea what its internal mental model is, or if it’s capable of “imagining” such a thing. But it understood the context as well as any human would; that a “blue dog” meant a dog that is the color blue, which may provide vivid imagery but does not happen in reality (though also noting that it may be a creative choice).

In any case, people with aphantasia seem to do just fine. I have fairly vivid internal imagery, but I don’t always make use of it; I often read books without bringing the scene to mind. It doesn’t seem to affect my comprehension much either way. So the existence or lack of mental imagery doesn’t seem to be a crucial component of language.

It’s certainly true that the current methods of training are inefficient. I’m not sure they’re fundamentally different, but clearly humans get a lot more out of each training sample than the LLMs do.

Some of that may be deliberate. It takes years to raise a child, and guided education. Perhaps the LLM would train more efficiently if it started with “baby talk” or the equivalent. Or if we let it “dream” and churn on data it was fed previously. But it’s easier to just feed it a gigantic corpus.

Does the physical connection play a role? Probably, but people with significant disabilities (blindness and deafness) still acquire language quickly. So at the least, we’re flexible in what we can use as input.

We don’t have direct access to the real world, though. Just our senses. The color of an orange, how it feels, the smell, and so on are no different than the word “orange”–totally meaningless in isolation. Our brains come up with some kind of “theory” about an object that is some shape, that stimulates our photoreceptors in a certain way, has some relationship with a pattern of air vibrations, as well as some squiggles on a page, etc. Ideally, it comes up with a consistent theory where all the multi-sensory pieces of evidence fit together properly and we can make predictions about it (say, that it is normally edible, but not if the color or smell are unexpected).

There’s no grounding because there can’t be. We’re all just working from indirect evidence about how things work. The LLM definitely has less to work with, given that it only has access to text. But it’s the same basic idea, where “orange” is just an abstract thing that it can only understand in relationship to other things. It has an internal representation that it can use to make predictions, just like people can.

But we know it’s not. That’s just not what it’s designed for: all it does is compute the best way of completing a text string. I’m not saying that our ability to imagine a scene is something mysterious and beyond the powers of an AI to replicate, I’m just saying that LLMs don’t do it—that’s not how they’re designed. We, on the other hand, do. (And even people with aphantasia will associate concepts with words, just not in a pictorial manner; and it’s that association where language understanding happens.)

I think there needs to be a fundamental difference, because the information a LLM needs to assemble its model to the point where its text generation assumes some verisimilitude just isn’t contained in the amount of text (or speech) the typical human is exposed to. This is why many proposals for few-shot learning don’t just use raw data, but supply some model of how that data is generated—for instance, for the standard MNIST task, adding a model of how digits are drawn greatly enhances character recognition.

Phrased like that, it sounds like you don’t consider our senses to be part of the real world, but I don’t think you actually hold such a dualistic position, so I’m not sure how to interpret this. I would say that we only have access to our senses, and hence, to the real world.

But that can’t possibly work, because just a set of relationships fails to determine any object at all. This is a consequence of the push-through construction: for any set of relationships over some domain, you can find another domain such that this domain can be imbued with the same structure, as long as the cardinality of the second domain is equal to that of the first. Thus, all relational knowledge really fixes about the world is how many things there are; to the extent that we do know more about it, we must have access to things beyond their relational identity. (This is the basis of ‘Newman’s objection’ against structural realism.)

So what? As already noted in a previous post, ChatGPT understands exactly what this means. This part of its analysis is a matter of basic semantics. Without getting into quibbling about what “understanding” means, the point is that if it was capable of producing images the way its generative LLM produces relevant text responses – which is by no means a stretch – it would produce the appropriate image. This gives no insight into human cognition, or how the latter does or does not differ from ChatGPT.

Except that a longstanding hypothesis has held that thoughts involve the manipulation of symbols, and that the symbols have an internal mental grammar and semantics. Jerry Fodor has called this “the language of thought”, or Mentalese. The hypothesis gained renewed attention and acceptance following Fodor’s 1975 book The Language of Thought. The concept is a cornerstone of the computational theory of mind, and suggests that there may be more parallels than previously believed between human and machine cognition.

But it doesn’t, and we know that it doesn’t, because it doesn’t even attempt to. That’s not to claim that ‘understanding’ would be impossible for a computational system, just that, from ChatGPT’s architecture, we know explicitly that ‘understanding’ is just not something it even attempts to do. It matches tokens to tokens based on the statistical regularity within a great corpus of tokens. For this, it doesn’t even matter whether they actually form a language, whether they refer to images, or anything at all: that’s the strength of the model. If you had access to a sufficiently large corpus of texts in Voynichese, it could be trained to produce Voynichese texts such that, if it is indeed a language, and indeed meaningful, and we eventually manage to understand it, these texts would be just as meaningful as those it produces in English. But whether or not Voynichese is a language and capable of rendering meaningful text—capable of being understood—does not factor into ChatGPT’s capability at all; that’s simply an entirely orthogonal concern.

But Fodor was explicitly a representationalist (‘no computation without representation’), yet ChatGPT treats words not as representative of anything (again, consider a Voynichese version). So if Fodor is right (and his views have been subject to sustained and sophisticated critique), then that’s precisely the opposite of @Dr.Strangelove’s contention that the way humans produce speech could be like that of ChatGPT. (And indeed, the capacities of neural networks as ‘subsymbolic’ forms of computation have been used as evidence against Fodor’s view, and conversely, Fodor has argued that cognitive structure could not be connectionist in nature.)

This seems like an appropriate opportunity to quote the great Marvin Minsky: “When you explain, you explain away”. This was a theme that he re-iterated often in the context of AI, that when you explain the internal workings of some AI engine, it appears to become revealed as merely a “mechanistic trick” and not “real intelligence”.

As Alan Turing recognized long ago, this is completely the wrong evaluation criterion, and it’s exactly what you’re doing. The rote mechanistic functioning you describe makes it sound like a more sophisticated version of Eliza. For purposes of this argument, the internals of ChatGPT must be regarded as completely irrelevant. The only pertinent question here is whether its responses are consistent with the inputs of the user in a manner that is sufficiently comprehensive and general across a wide range of subject matter that it can be said to have “understood” the user input. It is absolutely futile to try to philosophize about what “understanding” really means in some absolute or abstract sense; one must – and can only – judge by the results.

So when I asked it for recipe advice recently, and it came up with some good relevant suggestions, and when I asked about some alternate ingredients, it offered various degrees of approval, disapproval, cautions, and tips. It retains context so I can ask it followup questions and it knows what I’m talking about. Is that “understanding”? It sure gets my vote. The real point here being how general it is – it’s not subject-limited.

Now let’s go to @Dr.Strangelove’s conversation with ChatGPT about “a blue dog jumping in pink grass.” Just for fun, I followed up with it, and actually learned a few things. Although it’s unusual, dogs actually can appear somewhat blue, like the Weimaraner, Kerry Blue Terrier, and others. And some grasses can be pink or pinkish.

We concluded thusly:

So if I asked you to imagine a blue dog jumping through pink grass, this could actually be a realistic picture, right?

Yes, it is possible to imagine a blue dog jumping through pink grass, especially if you are imagining a specific breed that has a blue coat color, like a Weimaraner, and a type of grass that is pink, like pink muhly grass. While it may not be a common or natural occurrence, it is possible to create a picture or scene that includes these elements. Ultimately, it depends on your creativity and imagination, and whether you want to create a realistic or imaginative picture.

The above interaction was mostly just for fun, but the key question is, does it have sufficient “understanding” to be able to “imagine” and thereby create a picture of that scene, given a picture-creation subsystem? I don’t see how you can deny it, and that was the entirely of your argument. It certainly seems to know what a dog is, and what grass is, at least well enough to fetch the appropriate images from a clip-art library and assembly them in an appropriate setting. Where does “token-matching” end and “true understanding” begin?

The above evidence seems to belie that claim. Again, remember Minsky’s stricture about “explanations” versus observed performance.

I’ll just say this, and get it off my chest. I’m not qualified to defend Fodor against his critics (nor is this the venue to do it) but (per Wikipedia) “[Fodor’s] writings in these fields laid the groundwork for the modularity of mind and the language of thought hypotheses, and he is recognized as having had ‘an enormous influence on virtually every portion of the philosophy of mind literature since 1960’.

He was a towering figure in cognitive science who I daresay greatly overshadowed his various critics.

I think getting hung up on the mechanism of output is a mistake. So ChatGPT ‘talks’ by creating a statistical list of tokens, then picks the next token from the list repeatedly. This strikes me as the ‘system 1’ part of the equation. How do you know our brains don’t do something similar when we talk? I certainly don’t consciously think of every word I’m going to say as I say it, I just form concepts, and then the words just follow. Maybe the speech center is spitting out words with a similar mechanism.

So with ChatGPT, it seems to me that the key thing is where the concept formation is happening, not the mechanism for spitting out the words. It’s not a simple database lookup for the next token or anything like that. We actually don’t know how it comes up with the list of probabilities for the tokens. Those probabilities represent its ‘thinking’ as it were.

Take the transformer off the back end, and ask ChatGPT a question. You get back a list of probabilities which, when put through a transformer process, tell you what ChatGPT is ‘thinking’. Foocusing on the transforming part is a mistake, IMO. It’s just the system 1 that puts ChatGPTs thinking in a form we understand.

The same is true with image generqation. Yes, the process of outputting the picture is ‘diffusion’ perhaps, and we understand that mechanistically. But the thing driving the diffusion process? That’s another neural net that has hundreds of billions of parameters. We showed it lots of pictures, and now it can drive a diffusion transformer to make unique pictures based on text prompts. The magic is in the network, not the diffusion process.

When the output is interpreted. GPT is just statistically sorting labels. Any understanding is all within your mental process.