Large language model AI doesn't learn

I’m confused–are you abandoning your previous definition of learning and replacing it with a new one, or simply refusing to analyze whether it applies to a Polaroid camera?

In order for learning, there needs to be memory. In the case of ChatGPT there are two memories:

  1. The read-only memory created during training – 1 trillion parameters in the case of GPT-4.
  2. The temporary read-write memory that exists during a chat session.

Other AIs might have memories like an external knowledge base or a running timeline.

Considering the first memory, the read-only memory created during training, does it learn the schema of the data? Does it learn the traits and associations? This is exactly what it does. It blindly learns how words relate to each other. It finds associations that we would recognize and it likely finds more subtle associations that we do not recognize. It can be debated whether it understands the significance of the associations, but it certainly can recite them.

In earlier LLM research, there was an interesting phenomenon that was uncovered. Researchers built a language model by analyzing words and the words that most frequently occur near them. Part of the process of building the language model was to assign values to each word. These were multiple dimensional values, but you can think of each value as a 3-dimensional value – a word floating in 3D space.

When you analyze the words and their relationships to each other, you notice that similar concepts are clustered together. But what was even more interesting is that using the values assigned to each word, you could apply simple math to uncover the associations embedded.

For example, you could:

  1. Take the value assigned to ‘king’
  2. Subtract the value for ‘man’
  3. Add the value for ‘woman’
  4. And find that the nearest word was ‘queen’

The model has recorded the associations between man/woman and king/queen.

Another famous example is:

  • paris – france + poland = warsaw

Here is an interesting article about this observation:

…most notably because Turing proved that computation is substrate independent, and our intuition for the limitations of a non-biological computational machine is likely wrong.

Nobody would sensibly claim that current LLMs (or GPTs) are AGI, but it’s extremely difficult to distinguish between what they are currently doing, and the ultimate limits of the architecture. Unless there is some literally magical component to the human brain, we know that there exists some architecture that can achieve AGI, and it seems likely that GPT-type behavior will constitute at least part of that.

There are two different points here, and I’m sorry if I wasn’t clear. Point #1 is that, unlike a Polaroid camera or a piece of molded plastic, a system like ChatGPT synthesizes and generalizes information and establishes novel fact-relationships, thus exhibiting characteristics of intelligent behaviour. Point #2 is that, unlike a Polaroid camera or a piece of molded plastic, this capability is acquired by learning through extensive training, which is true not only for ChatGPT, but for virtually all contemporary high-performance AIs.

Why do you think LLMs lack the ability to create schema? In fact it appears that in the human brain we create schemas through special neurons that link objects and concepts together, like the ‘Halle Berry’ neuron that fires if you see a picture of her, or see her name, or see the word ‘catwoman’ maybe, or any number of things your brain has associated with Halle Berry. Collections of these neurons make up the schema around concepts, objects, etc.

Now here’s the thing: The very same structures have been found in LLM’s after training.

People who don’t believe machines can think or learn focus on the word generation part, but gloss over just how the LLM knows whatnthe next word probability should be. To do that accurately requires schema, concept formation, and pretty much the same types of analytical analysis that human brains do.

You keep saying machines can’t think like humans. What do you think is the unique quality in humans that allows them to do what you think machines can’t do? Because I think humans are biological machines.

We understand things because we interact with them.

John Conway described high-dimensional spheres as spiky. If you do math on them, you’ll find that they’re invading spaces that don’t make sense to us - thinking of how balls pack into a box together, and how they and the gaps between them relate to one another. They must be spiky if they’re invading the gaps between the balls, even though we know that they also must be exactly like a sphere - the most compact shape imaginable in any dimension.

If we actually lived in the higher dimension, these “spiky” balls would go back to being just ordinary, easy-to-understand, compact shapes. They only seem to have this bizarre, spiky quality because we don’t live in that space and have only ever had it described to us, by a third person (math). It’s (probably) not that our brain can’t handle higher dimensions, though, just that we don’t live in that environment and haven’t been provided with detectors evolved for that environment. You’ll never be able to properly associate with an environment that you haven’t been a part of.

We would like to think that there’s something more magical to ourselves than that we’re a bundle of nodes that transfer signals in a weighted fashion to other nodes, algorithmically - that we have a soul, a spirit, or something of that form. We do have hormones, which punish and reward us for evolutionarily advantageous patterns of behavior. chatGPT doesn’t have that but it is trained by creatures who do. Minus those hormones, however, there’s no reason to think that we wouldn’t just be mechanical logic machines that are performing a very complicated form of machine learning. We haven’t found the portion in our brains nor in our bodies that is something else and, I’m fairly willing to bet, there isn’t going to come the day when a scientist reaches into a body and pulls out a soul.

We have perceptions that are generated by our brain which seem to be strongly integrated into how we create models to represent the world. What set of neurons generate the perception of purple or the scent of a goat? Will we ever know if machines have the same thing? That’s not to say LLMs or other AIs don’t learn. But it definitely is a different form of learning.

Dogs don’t understand either. They can learn to do things, like sit down when they smell heroin, but they don’t know what heroin is, or why they’re sitting down when they smell it. Their simple little dog brains just have pounded into it via endless repetition… Smell X… Sit.

X could be heroin, explosives, cancer, a trapped human being or bed bugs, the dog don’t know or care, it is just doing what it has been trained to do.

ChatGPT just does what IT was trained to do. Endless repetition analyzing text via the algorithms written by its creators, and when ChatGPT gets an input of text, it sits, I mean writes.

Is it a different form of learning, or the same kind of learning just done in a different environment.

Whem a human learns, the end result is that the activation potentials of various neurons in a huge neural network change, strengthening and weakening connections.

When LLMs learn, the end result is that the weights connecting the virtual neurons change, strengthening and weakening the connections between them. It’s basically the same process.

To be sure, LLMs lack features of a brain. Most importantly they are sequential, whereas human brains are recurrent (they can have loops and feedback). This is slightly ameliorated by the attention process and the retention of the context which the AI can go back over repeatedly as it works.

Also, neurons in brains are physical things connected by synapses, while ‘neurons’ in an LLM are virtual and the work is done by the parameters, which are more analagous to synapses.

The brain also has specialized structures we don’t fully understand, and which are lacking in an LLM (maybe - we’re just starting to look inside them to see how they do what they do. Maybe we’ll find analogs to various brain structures…)

However, we don’t know if any of these differences are crucial. None of the ‘failures’ of AI (hallucinating, etc) are due to the lack of any of these, except perhaps recurrancy in that the LLM can’t judge its answers as it formulates them. All it would take is another AI or a new feature that checks the AI’s response for errors, and that problem is solved. Call it a ‘judgment’ module, or executive function.

We may never know that you and I have the same neurons doing the same thing. It’s likely that it’s a similar question to asking how internet packets were routed. I asked for google.com and you asked for google.com, and we both got the same google.com homepage. Thus…we must have had the same routing through the same nodes? No.

There’s data coming through the eyes from three different types of sensors. Our brain is going to take in and use that data, so long as it finds some utility in it. The less utility (e.g. the less that it seems to provide any significant results correlated to the input) and the more it’s going to downregulate that input. Eyeball input is strongly correlated to our environmental results so our brains are always going to keep and use the signal as far as it can.

But, how it ends up spreading that information around, and what weights it gives it in each node is, almost certainly, much different in my head than your head.

There is a real source of red and blue light coming in, that’s consistent between us. There is a shared set of vision receptors, operating in very similar ranges. We’re both going to be interpreting, effectively, the same signal and that does mean that we can come to an agreement on what word to use, what associations to make with it, etc. Even though our brains route the information differently, as far as our ability to communicate and interoperate is concerned, it’s the same purple. But that doesn’t mean that we store and use it in our brains in a node-to-node equivalency.

In point of fact, we don’t even have the same nodes. There’s no 1-to-1 mapping between any one neural cell in my head and yours. We have different folds, different connections, different counts, and different weights. It’s impossible for us to be doing things in the same way.

If you and I don’t have the same purple, except as a social agreement, why expect the machine to be different?

LHOD wrote: “One things that’s interesting to me in this thread: the OP of the thread is centered around the concept of “schema.” If you don’t engage with the concept of schema, you’re not really engaging with the OP. Yet that word doesn’t appear anywhere in the thread except in my posts. A lot of folks appear to be ignoring the OP in favor of responding to the title–which is your right, but I think it misses an opportunity to discuss one of the central ideas in how humans learn.”

In the original draft of my post, the A and B terms in my example’s mandatory, positive, and negative rules were described as schemas. But I opted to leave that out because it was getting very wordy. Consider this rule,

Any of the following words (car, vase, pencil, hat, house, etc.) may follow any of the following words (red, blue, yellow, green, orange, black, white, grey, etc.).

The person or machine who creates such a rule has effectively created two schemas, which in this case (and unbeknownst to the person or machine in question) happen to have well-recognized names: nouns and color adjectives. Furthermore the integrity of the schemas and their relationship to each other are tested against each book in the library. The rule could be rewritten,

(continuing from previous post)

The rule could be rewritten,

Any noun may follow a color adjective.

These are the two steps I would think existing artificial intelligence are incapable of performing. So are very young children or foreigners that don’t understand our language. That is to say, they don’t understand humor or semantic meaning (respectively) and therefore can’t ensure the joke is funny. Without these two steps you get lame-o jokes like “lose some unnecessary words.” Or poop jokes. The AI probably put “lose” in a schema of words with had a relevant double meaning, as you did with “pounds”, and therefore built the joke around that.

~Max

First, is the No True Scotsman problem. If you tell me, “Humans use schema to learn. Only schema means that a being can learn.” Well, I don’t have any particular duty to take your stance seriously any more than I have a duty to work within someone’s statement that Scotsmen and only Scotsmen eat porridge. It’s on you to demonstrate that schema is real thing, that it applies to the situation, and that it’s supreme within that question.

Second, we would generally term schema as cluster analysis when talking about AI and machine learning. More importantly, if we want to test the ability to break things down into schema, we can do things like asking a being to identify creatures in a picture, organize concepts into a table, etc. We can see AI do those things and they get better the more and better that we teach them.

Please take a look at this paper from OpenAI, where they used GPT-4 to analyze the neural net of GPT-2:

https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html

It is just FULL of schema representations that link objects, ideas, etc.

For example, they found in layer zero a parameter (parameter 816) that lights up whenever superhero themed content pops up. This is exactly like the ‘Halle Berry’ neuron in humans.

Can there be any doubt that the creation of such neurons represents real learning?

Also, anyone who says AI is ‘just a word prediction machine’ or anything like that has to grapple with the fact that OpenAI doesn’t understand what’s going on inside these things, other than that they are highly complex and seem to structure themselves like brains after ingesting huge amounts of training data.

Now, here’s a list of some of the specialized neurons they have found, with an interactive tool for browsing them. For each one, they show text color coded for which words the parameter fired on.

Remember, this is in GPT-2 which is a much more primitive LLM. That’s why they used it. And yet… The schemas represented by neuron associations are pretty rich.

GPT-4 is a multi-modal model, which means it can associate images, text, and audio. This improved its scores even on purely textual questions, probably because adding the other media gave it more context. I’m certain that if we analyzed GPT-4 in the same way we’ll find ‘neurons’ relating a photo of Eddy Van Halen with biographical information, photos of his guitar, samples of his music, bandmates, critics guitar players that play like him, etc.

GPT-5 is rumorrd to add video to the mix.

I’m not responding as much now because y’all are giving me lots to think about–thank you! I especially appreciate folks who are addressing the idea of schemas, as this is my biggest sticking point. I’m in a state where I’m genuinely unsure: I’m not convinced that it’s accurate to describe what LLMs do as “learning,” but I’m far less convinced that it’s inaccurate. I’ll probably continue to sit back rather than respond.

Thanks for the feedback, and I’m glad you’re using it as an opportunity to re-evaluate some of your views. I respect your understanding of human learning, especially in children, but I frankly don’t think your arguments against learning in AI are well supported.

At risk of being repetitive, I’ll just stress that the concept of “learning” in AI is decidedly not a metaphor, but holds up quite literally by any measure – at a behavioural level by improved performance after repeated exposure to training scenarios, and sometimes even by observable changes in synaptic connections between neurons in virtual neural nets that are directly analogous to similar changes in the human brain.

It’s notable that most high-performance AI systems are created through a combination of building a software framework, analogous to birthing a brain, followed by an extensive regimen of training that is at least as much and usually much more of a determinant of performance than the underlying hardware and software. Many (not all) such systems are built as so-called deep neural nets, where “deep” refers to a layered hierarchy of neural organization, where the purpose of the layering is explicitly to learn and extract hierarchical features in the input data, encoding complex relationships that can be fairly described as schemas.

A data point on AI Learning vs copying. Yesterday I discovered that Dall-E 3 (or whatever they call the version Bing uses) thinks that Domo-kun is a robot with a large mouth. This is not copying and pasting bits of images of Domo-kun from the training set because Domo-kun does not look like that. This is the AI studying the available images and arriving at a concept of what Domo-kun is. A concept that happens to be wrong. But I think the mistakes in AI learning can be more illustrative than the successes in showing that more is taking place under the hood than simple copying.