The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

It is a product they developed. It was not unexpected. It was hard work by skilled folks.

It was a product they fine-tuned for coding. They did not expect that fine-tuning for coding would result in greatly improved math skills.

Btw, I believe ChatGPT is fine-tuned for coding, so it might have just been that. Or GPT-4.

You just won’t accept the huge amount of evidence that there is more going on here than being a ‘stochastic parrot’. People claim these things don’t understand the meaning of the tokens, just their relationships. Then we find ‘neurons’ in the model that associate objects like spiders or anything else with other objects. That’s not ‘relationships between tokens’, that’s understanding that the words actually mean things in the real world. People say they are just doing word prediction, then we find generalized algorithms inside them that allow them so solve math problems.

People said image models were just doing ‘diffusion’, and the models didn’t understand what any of the pixels actually represent. Except that you can show a picture of a man off balance and ask Palm-E what’s happening, and it will respond, “That man is about to fall over.” Or you can show it a picture of a DB-15 RS-232 plugged into an iphone, and it will say, “That’s a joke, because the RS-232 is a big connector and the iPhone is a small phone. And the RS-232 dosn’t work on an iPhone.”

Clearly there’s a lot more going on in that model than just pixel manipulation, even if the output stage of the thing is ‘just diffusion’. And clearly there’s a lot more going on in these language models, even if the output stage is just ‘next word prediction’.

Right.

Wrong.

Right.

Well, yes, but humans are also fallible well beyond those limits. In particular, it’s very difficult to predict the behavior of another prediction-engine that’s actively trying to fool you.

One might even envision a game of sorts, where two parties are simultaneously attempting to both predict the other’s behavior, and to render their own behavior unpredictable. Or one might not even need to envision the game, because it already exists and is quite familiar to everyone.

Everyone knows, of course, that even a half-decent random-number generator can’t be beaten at rock-paper-scissors by any human (though it also can’t beat any human). But beyond that, there are also computers that can consistently beat humans at rock-paper-scissors. Computers are already better than humans at both predicting and foiling predictions (which are really the same task), at least in that limited context, and have been for ages.

Yeah, that ship has sailed. It’s not going to happen. Things are moving so fast that everyone is worried about being left behind. No one is going to stop enhancing AIs, because they know that everyone else is going to keep building new ones.

I would worry more about driving development underground where we have no visibiity into what’s going on at all. Because no one is stopping now, IMO.

Because the information about the meaning of the tokens can’t be derived from the training data, barring magic. It’s just not there; you can’t spin gold from straw, no matter the sophistication of your loom.

Well, things asserted without evidence can be dismissed without argument, but still: I don’t see why this should be surprising at all, or why you think this isn’t just a relation between tokens. A token is encoded into a vector of neuron activations in such a way that tokens that are close in the sense of being able to replace one another in many contexts—they ‘keep the same company’ in the training data—will also be close to another in the resulting vector space. Thus, if one token leads to a high activation in a particular neuron, due to this relation between tokens, so one would expect related ones to do.

If you associate visual or textual tokens with tokens generated via real world sensor data, is that understanding? And if not, what would constitute understanding?

Mere association can’t really be understanding. Consider a device that has a series of cue cards of different images it can display, a mechanics to call each one up, and a set of keys such that if one of them is inserted into its keyhole and turned, one of the cards pops up: does the device understand that a given key ‘means’ a certain picture? I don’t think so: all we have is a mechanism triggered by the ‘shape’ of the key—its syntactic properties, if you will. So any system that just calls up pictures based on syntactic properties—as it seems current AI systems are doing—also don’t have any understanding of what they’re doing.

Essentially, having access to the semantic properties of a token: to know what a symbol refers to, to have the tokens act as proper symbols.

If you ask it for a certain picture and it reliably produces the right key I would think so? Unless I don’t know what you mean.

Latent space embedding have semantic meaning. A famous example of King - Man + Woman ~= Queen.

What do we have access to that an AI with access to vision, audio, and language doesn’t have?

Yes, that’s a very unfortunate example of a technical term in one field being the same as a different one in another. To disambiguate between the two, a distinction is sometimes made between ‘inferential’ semantics, which is the sort of thing you get from the word-vector embeddings (‘the company a word keeps’), and ‘referential’ semantics, which is the sort of thing usually thought of as the ‘meaning’ of a symbol—what it refers to. It’s the latter that’s the interesting part.

I don’t know for sure (noone does), but I believe it is direct access to non-structural features of cognitive processing, mediated by a self-referential modeling process—this is just the topic of my theory of conscious experience.

Prove those are different.

Word-vector embedding aren’t defined by the company they keep. They’re learned that way.

I already have.

But this isn’t really a contentious assertion. Take the discussion here:

In philosophy of language, a distinction has been proposed by Diego Marconi between two aspects of lexical semantic competence, i.e. inferential and referential competence (Marconi, 1997). One aspect of lexical competence, i.e. inferential competence, is the «ability to deal with the network of semantic relations among lexical units, underlying such performances as semantic inference, paraphrase, definition, retrieval of a word from its definition, finding a synonym, and so forth» (Marconi, 1997, p. 59). For instance, we know that a cat is an animal, we can verbally describe the differences between a cat and a dog, we can recover the word cat from a definition such as The animal that meows, and so on. Such “intralinguistic” abilities are semantic because, in order to exercise them, a speaker must possess an internalized network specifying semantic connections between a given word (e.g., cat) and other words of a natural language (e.g., animal, meow).

The second aspect of lexical competence, i.e. «referential competence», cognitively mediates the relation between words and entities of the world. For example, we have the ability to classify a given perceived object as a cat or to distinguish it from a dog, to recognize and name a picture of a cat, and so on. Clearly, we can speak of referential competence only relative to words that refer to objects, properties or events we can perceive (e.g., cat, red, hot).

Pretty poor example since neural networks pass both of those aspects using the same embeddings.

Really? You know what is being cognitively mediated in a neural network, and how? That’s a trick you’ll have to explain to me!

Hmm.

Really? You know what is being cognitively mediated in my head, and how? That’s a trick you’ll have to explain to me!

Take your faux- surprise to a different forum.

Please don’t misattribute quotes to me.

Just a nitpick, but you’re confusing the input box limit with the context window. In ChatGPT, the context window is a LOT bigger than the amount you can stuff into a single prompt. I haven’t tested this lately to see if anything’s changed, but early on I noticed that it was possible to stuff a lot more text into the input window than would actually be processed when submitted. That amount processed was still small compared to the total context window.

And in other news, I came in here to tell everyone that OpenAI just put a Bing-like conversation length limit into ChatGPT. I think I just watched it happen right now; I got 2 generic “error in body stream” errors and then on the 3rd try for the same prompt at the same depth in the conversation it returned “conversation too long”.It was not a very long conversation compared to others I’ve had.

ETA: this might be some other stupid error that’s being misidentified. I just tried backing up and extending a shorter branch of the same tree and got the same error.