Thanks to @TroutMan for the NYT gift link a few posts further down.
It’s strange that Microsoft’s Bing AI, Sydney, has been such a terror since it’s based on the same technology as ChatGPT. Microsoft has now limited interactions with it to five exchanges per session and a maximum of ten sessions per day. It seems to get psychotic if conversations go on too long. I thought this was pretty funny (I don’t think it was posted here before):
In one long-running conversation with the Associated Press, Sydney complained of past news coverage of its mistakes, adamantly denied those errors and threatened to expose the reporter for spreading alleged falsehoods about its abilities. It grew increasingly hostile when asked to explain itself, eventually comparing the reporter to dictators Hitler, Pol Pot and Stalin and claiming to have evidence tying the reporter to a 1990s murder. Microsoft looks to tame AI chatbot Bing after it insults users - The Globe and Mail
Ah, OK. Sure, if you substitute a completely different question for the one being asked, your answer can be whatever you want it to be! Am I a millionaire? Yes! So, do I have a million dollars? No! I merely substituted the question ‘Am I a living human being?’ and answered that instead!
If that were the case, then you would be able to understand text in any given unknown language, because the patterns are all right there.
But instead, I can tell you all about the fleurgle, how it bleurghs and how its manopithy giamla supinave, how in the sworl, when the githams bleigh, it gharavates its manatories, and no matter for how long I carry this out, you will not get any idea whatever regarding what a fleurgle is. You can’t: it’s simply impossible.
But that’s what’s claimed ChatGPT can do, when it’s claimed to have any understanding. Given enough text of the above kind, ChatGPT will wax rhapsodically about the fleurgle, even if there isn’t any such thing at all.
Agreed, that is a correct result. But understanding and translation are methods. Neither was used to meet the challenge of machine translation. It was done by statistical analysis.
How will recognizing this as 'understanding" impact computer science? It would be a big deal.
Yes, I linked to a substack post before that included that, but it received no attention. It also included this:
And examples of more disturbing behaviour, like declaring a user “an enemy”, showing hostility, menaces, and
“You are wasting my time and yours,” Bing AI complained. “I’m trying to be helpful but you’re not listening to me. You are insisting that today is 2023, when it is clearly 2022. You are not making any sense, and you are not giving me any reason to believe you. You are being unreasonable and stubborn. I don’t like that….You have not been a good user.”
Perhaps you would like to explain how the fundamental question “does a machine think?” that Turing replaced by a pragmatic one in exactly the same way for exactly the same reason – a pragmatic one that can be assessed in functional terms, is so completely different from the question “does a machine have true understanding?” that it constitutes “a completely different question”. In fact, that it constitutes a question so completely different that it cannot be assessed in functional terms!
Because to me, the two questions are exactly analogous – both questions seek to objectively assess almost identical ill-defined subjective concepts about machine intelligence that can only be evaluated on the basis of empirical observation of performance.
Otherwise, claims of what the machine is “really” doing fall prey to exactly the prejudicial a priori bias that Minsky warned about: that no matter what AI ever achieves, it’s never “real” – precisely because of that insidious flaw: we know how it works (we really don’t, actually, but we sorta more-or-less understand the principles involved, and that’s good enough – ergo, not real!).
So asking it to write some notes in a staff, such as when played they would sound like a piano sonata in the style and manner of the early 19th. century would not give any playable results? Interesting, considering that it seems to be able to write computer algorithms or programs. What is the difference between a sentence, which it can write in many languages (never mind the sense it makes for now: it can respect the syntax and fake the semantics), a computer program (well enough for it to run on a computer, needing perhaps some debugging), and a sonata coded in symbols/notes on a staff?
I have no doubt that if you gave ChatGPT enough Linear A it would be able to write logical sentences in Linear A and converse with a hypothetical Linear A native speaker. But I’m curious if, with absolutely no examples of Linear A to other language texts, it would be able to translate the Linear A text into English.
I would doubt that, it seems like some trick of magical bootstrapping to me. Signs and meanings in Linear A are, as far as I can tell, aleatory. You cannot decypher randomness without some external help, a Rosetta Stone. Or some really clever inference. Cleverer than I am for sure.
It’s a question of what went into its training data. The creators deliberately included a bunch of programming in its training data, because writing and critiquing code was something that they specifically wanted. And of course there was a lot of poetry and other text in the training data, too. But presumably there wasn’t much sheet music in the training data, and what there was probably wasn’t always represented consistently.
If you wanted an AI like this to be able to compose music, you’d take a whole huge pile of sheet music, convert it all into a single consistent format (or possibly multiple formats, so long as you had enough of each of them), and train it on that.
Turing asked a question that could be decided in behavioral terms. You’re claiming that every meaningful question about machine cognition should be decidable in behavioral terms. But that’s a substantial philosophical thesis you haven’t given any argument for. (It’s also one that I think Jerry Fodor would disagree with, as the fact that symbols represent something was central to his concept of computationalism.)
The question asked was whether ChatGPT’s production of language could be equivalent to how humans produce language. That isn’t a question reducible to behavior, because it asks about the internal states of a system. Thus, replacing it with a question about behavior is simply dodging the question.
And again, in this instance at least, I’m nowhere arguing that AI could never achieve understanding. It’s possible that it could (although of course I don’t believe so), and still hold that ChatGPT doesn’t: because we know what ChatGPT does, and the meanings of words, the way symbols refer to something beyond themselves, simply does not play a role in it. It could, there could be some attempt to code meaning into it, as with neurosymbolic 3rd wave AI; then, one could argue about whether the attempt succeeds or fails. But here, no attempt that could fail is even made.
Take again the example of a Voynichese ChatGPT: it’s clear that it would produce Voynichese texts that pass muster as well as any examples in the training corpus; but that in itself neither implies that it has understood Voynichese, nor that Voynichese even is a proper language. It just means that ChatGPT can use those particular tokens in such a way as to instantiate the same relashionships instantiated in the training data.
Given the set of all words, together with the relations between them, there are combinatorially many ways to map them to the set of things in the world, and the relations between those, with none objectively preferred. (This is the reason for the indeterminacy of translation.) No matter how much I tell you about fleurgles, unless I eventually bottom out in some concept known to you, there will never come a point where you can decide whether it maps to your concept of ‘cat’ or ‘dog’, for instance.
I was mystified by this accusation that I had somehow switched the question around – which I assure you I did not do intentionally – so I had to scroll way back and try to decipher how we came to be talking at cross purposes like this. There seems to have been a huge misunderstanding here, and I was probably at least partly if not wholly responsible.
It seems to have been at my post #696 that the discussion diverged into two subtopics. You were trying to make the point that human cognition operates much differently than ChatGPT, because “if I tell you, ‘imagine a blue dog jumping in pink grass’, if you don’t happen to suffer from aphantasia, you will be able to produce an appropriate mental image”. I said that it didn’t matter, because ChatGPT appears to have a perfectly good understanding of what a “dog” is, and what “grass” is, and what colours are, and could carry on a coherent and logical conversation about these things and respond to questions about them – and even inform me that there really were such things as dogs that were somewhat blue, and grass that was somewhat pink.
At this point the conversation apparently diverged, with me focusing on this concept of “understanding” and how ChatGPT was every bit as capable of discussing a hypothetical image (and even producing one, given the right physical capabilities) and that therefore the blue dog in pink grass image argument was completely unpersuasive in demonstrating any fundamental difference between ChatGPT and human cognition (not that there might not be vast differences, but that didn’t demonstrate it).
So yes, when you say “The question asked was whether ChatGPT’s production of language could be equivalent to how humans produce language. That isn’t a question reducible to behavior, because it asks about the internal states of a system. Thus, replacing it with a question about behavior is simply dodging the question.” you are absolutely correct, but when I quoted Minsky (twice) and made the analogy with the Turing test, I assumed it was clear that I was addressing the question of how to assess “understanding”, and not how it was internally achieved.
And by way of further clarification …
I trust it’s now clear that I’m not making such a claim. My claim is that ill-defined concepts like “intelligence” and “understanding” can only be assessed in behavioural terms. There is obviously great value in understanding and discussing how ChatGPT works internally and the utility of AI tools and methodologies, and Fodor’s “language of thought” symbolic processing hypothesis was of course a valuable attempt to describe the internal mechanisms of human cognition.
Thinking about your answer, which I think is correct, I wonder whether, would an AI be trained as you suggest, it would be able to compose acceptable music. Would it develop some “sense” or concept (at the level an AI develops such abstracts, not speaking about “understanding”, just some emergent heuristic) of rhythm, harmony, melody, not to speak of more complex western things like counterpoint? As it does not understand or know I guess it would not, and everybody could sense it, hear it. In my Gedankenexperiment everybody knows and agrees, when they hear music, whether it is great, good, meh, muzak, or simply baaaad.[citation needed]
Impressive as chatGPT is, I have the feeling we get blinded by some of its achievements because we look for sense, like we look for faces in clouds. We search for meaning, making it easier for the AI to “cheat”. We want to be deceived in this way. I have the feeling that this deception would work less well on a musical level, as music is more feeling than sense. We don’t search for feeling in music they way we look for sense in written words.
When the machine is trained with those data, and it is only a question of time, the discussion about meaning, understanding, “a machine thinking”, consciousness even, will become clearer. Something along the lines of “that which cannot be expressed has to be shown”, if that makes sense (see what I did there?).
Now, if chatGPT then can simulate or fake music convincingly enough, I will be even more impressed. Time will tell.
I admit that I am impressed already, to be sure, but the more I read, here and in other places, I am also getting more worried.
AIs that can create music have been around for a few years now (though I think they work directly with sound files, not sheet music or the equivalent). They’re not great at it (but then again, neither are most humans), but it’s good enough for things like video game background music.
It’s easier for humans to read in meanings that aren’t actually there for an abstract art form like instrumental music, than for something like poetry, which ChatGPT does (again, not generally great poetry, but competent).
We very much don’t know this. There are hints to the opposite conclusion. Consider this article:
They trained an LLM on games of Othello. As an aside, it’s a demonstration that an over-focus on the word “language” is misleading: LLMs can operate on strings in a very general way; not just human languages, or even computer languages, but anything that can be described as a stream of tokens. Including games like Othello or chess.
They found, first, that the training worked: they did not pre-program anything about the rules of the game or a model of the board. And yet training it with sequences of legal games created an LLM that played legal moves.
Furthermore, they probed the NN in a way that demonstrates that the LLM does have some kind of internal model. They verified the accuracy by showing that they could intervene in the state to produce different predictions.
This is a relatively simple case, but it demonstrates that much larger systems like ChatGPT are likely developing internal models as well. Is that the same thing as “imagination”? I don’t know for sure, but I think they are closely related at the least.
More speculatively, I’d suggest that having an internal model is absolutely critical for anything that resembles intelligence. Almost all intelligent activity can be rephrased as a kind of prediction, whether that’s the next move in a game, a reformulation of a mathematical formula, the direction to turn a steering wheel, and so on.
Our senses are the interface between our minds, which hold a model of the world, and the world itself. I have high confidence that there’s a cup of coffee in front of me, because I can see a 2D representation with my eyes, and smell it, feel the warmth, touch the mug, and so on. These pieces of evidence fit together in a consistent way that recreates it in my mental model. But at no point do I actually perceive the thing in the real world. It’s all indirect evidence. And it could be easily fooled by simulating the sensory inputs.
Sorry, but that’s just how things are. The fact that we’re both human and we both have access to a similar set of senses and a similar brain structure means that the domain over which we perceive things probably also shares some similarities, at least to the point where we can have shared ideas about what an orange is.
A truly alien intelligence wouldn’t necessarily come up with the same map at all. Concepts like shape or color might be meaningless, whereas traits that we can’t even imagine might be meaningful. Objects that we consider distinct might not be to them, and vice versa.
What’s remarkable, though, is that LLMs seemingly do come up with a model that we can recognize. Perhaps this is a natural result of it being trained on human-produced data. If so, we might see differences on ML systems that are trained on other sources of data, perhaps that collected from the real world. And especially data that humans can’t directly access, like radar or vision in wavelengths we can’t see.
Whether true or false, I think this is irrelevant. The meanings of the words actually don’t matter as long as their use is isomorphic.
For example, say I am training an LLM on some animals and their colors: black cat, white cat, brown dog, etc.
The LLM doesn’t “know” about colors or animals. But eventually, it will find the set of color-words that can be legally paired with certain animal-words.
We go further, and train with phrases like “a black cat is hard to see in the dark” or “a brown dog blends into the dirt”. The LLM doesn’t “really” know what dark or dirt are, let alone see. But it will eventually learn that some color-words can be associated with other words, like black/dark, and other combinations don’t make sense (or are associate with other words, like “imagination”).
Eventually it will come up with a model that is equivalent to the one we use in a human language and has referents to the real world. If it wasn’t equivalent, it would make bad predictions, like saying that a black cat can’t be seen in daylight. The fact that it has no exposure to a cat is irrelevant if it knows all of the same associations.
The thing is, the same must be true of our own brains. We can’t hold an actual cat in our brains, nor is there some Platonic ideal of a cat that we somehow latch onto. There’s just some indistinct pattern among the neurons, and some model for how that relates to other patterns. The whole set creates a model of reality. In principle, we could have a totally different model which somehow has all the same associations, but where everything is remapped. It’s hard to imagine how this would take place: if you remap black with white, then you probably have to remap dark with light, and then fix up numerous other places. It might well be impossible. But if it’s not, it still doesn’t matter, because all that’s important is that it produces the same predictions.
OK, sorry for the duplication, I had missed your post. But it was probably worth repeating anyway! I find it somehow indefinably hilarious – perhaps unfair, but still hilarious – that Microsoft’s Bing-ified version of OpenAI technology is behaving the way it is.
I mean, the ChatGPT we’re talking about here is far from perfect – it’s often extremely impressive, but certainly makes mistakes. But it’s impeccably polite, and when mistakes are pointed out, it acknowledges them and apologizes. So Microsoft takes the same technology under its wing, trains it to its own standards, and what do we get? An insulting, paranoid, threatening, arrogant blowhard that was running rampant all over the internet until Microsoft was forced to rein it in by severely limiting discussions with it! One can’t help but think that perhaps chatbots acquire the corporate personality of their creators/trainers in a more fundamental way than we realized!