Large language model AI doesn't learn

In recent discussions, there’s been some debate about whether it’s accurate to say that large language models/AI can learn things. I have strong skepticism that this is an accurate description, or even a helpful metaphor, and would like to hear from others.

First, some examples. Almost definitionally, humans can learn things. If we discard the idea of humans learning things, the word “learning” becomes meaningless–or at least, it becomes a description of a fantasy process. If you’re an extreme behaviorist who denies the ability of a child to learn something, I invite you to some other conversation way over there.

Some examples of learning:

  • A child learns to speak.
  • An adult learns how to play guitar.
  • A person learns that Olympia is the capital of Washington State.
  • A dog learns to sit when you say “Sit!”

Some non-examples of learning:

  • A piece of plastic is molded into the shape of a Fisher-Price Speak-and-Say toy, and acquires the ability to make the noise, “The cow goes moo!” The plastic has not learned to speak.
  • My iphone downloads a Jimi Hendrix song. My iphone has not learned how to play guitar.
  • A bookshelf is shelved with an atlas of US maps. The shelf has not learned that the capital of Washington State is Olympia.
  • A couch cushion acquires the shape of your butt. It has not learned what your butt is like.

Way back when I was in teacher school, we talked a lot about “schema”. A schema is a concept consisting of a cluster of smaller concepts. For example, a toddler may have a schema that we call “Cat”, and it might consist of other concepts:

  • Fuzzy
  • Smaller than me
  • Meow
  • Face
  • Mr. Tibbles
  • Tail
  • Scratches if you pull tail

If a toddler is out in public and sees something fuzzy with a tail and a face, the toddler might point and say, “Cat!” because the thing has enough in common with its cat schema to be identified as such.

Schema can be modified through addition and subtraction. “That’s not a cat,” you say, “That’s a squirrel! See how big and fluffy its tail is?” Now the schema’s “tail” concept becomes:

  • Tail (but not big and fluffy, that’s squirrel).

Or you might say, “Look, kid, at the lion! Did you know that’s a kind of big cat?” Now the “smaller than me” subconcept becomes:

  • Smaller than me sometimes, but also can be bigger than me.

And then you can add entirely new schema. “Kid, wanna learn about tardigrades?” you say, and now there’s a schema to fill up with concepts.

The key thing is the clusters of concepts. Humans hold ideas in clusters, and then they connect concepts one to the other. When evaluating the truth of a new idea, they access their current schema and compare the new idea to those schema.

Dog brains work fairly similarly to human brains, in this regard. Dogs develop clusters of concepts around things like “meaty bones” and “people I live with” and “things to chase.” They can add or subtract from those: when a bee stings them on the nose, “things to chase” might get a new qualifier, and when a new baby enters a household, “people I live with” is expanded.

Bookshelves don’t work like that. You can load a bookshelf with new books, and it can therefore acquire new knowledge, but it doesn’t impose any concept cluster on the knowledge it acquires. It just holds that knowledge. We don’t say it “learned” the knowledge that it’s acquired.

iPhones don’t work like that. When I download a Jimi Hendrix tune, my iPhone has the capability to do something it couldn’t previously do. But that doesn’t represent learning. It doesn’t build concept clusters in anything like the way that human do; at best, it has keywords that have many-to-many relationships in a database.

And, as I understand it, large language models don’t work anything like that. They have an enormous database of words, and they know which words most often appear after other words; but that doesn’t involve building concept clusters around the meaning of those words. Their understanding of what the word “cat” refers to is nowhere near as sophisticated as a toddler’s. A large-language model with a new input of a million Wikipedia articles no more “learns” the information in those articles than an iPhone learns to play guitar.

It is sometimes tempting to talk about what AI is learning–but I think it’s misleading. The temptation to anthropomorphize AIs is very strong, but it distracts folks from what’s actually happening.

Thoughts?

In one of the other AI threads, I fed Chat GPT a series of somewhat whimsical laws from a fictional country. I then gave it scenarios and asked it which laws, if any, were being broken. I think it got confused once, but when I prodded it with a “are you sure?” it corrected its answer.

I agree that it can’t learn or understand in the way we’d use those terms as educators, but a LLM can absolutely receive wholly new data and use it to inform future output. I don’t know if it’s “synthesis” exactly, but it’s not nothing.

Yeah–I recognize that the bookshelf analogy isn’t exactly apt, because the model is using some analytical tools on its data. But what it’s doing is so fundamentally different from what a toddler, or even a dog, does, that I think it’s misleading to call them by the same name.

I agree. We probably need new terminology. B
Saying “it’s learning!” is wrong because it invites anthropomorphism, but saying “it’s not learning!” is wrong because it’s capable of doing something learningish.

I don’t think trying to answer a question about learning by restricting knowledge to a particular form is useful. Instead, it’s better to explicitly list the qualities of learning. For example, to “learn”, an entity must:

  • have knowledge
  • have senses
  • adapt its knowledge based on its senses

Using those criteria, humans and dogs can learn. Plastic pieces, bookshelves, and cushions cannot. The iPhone is trickier. It can’t learn to play guitar because the phone cannot physically pluck strings. But can it play music? My definition is not sufficient.

Does a film camera meet your definition?

If that’s all they do, is that enough to really count as “artificial intelligence”?

I know a little about AI and how it works, but maybe not nearly enough to meaningfully participate in this discussion. Still, I’ll throw out the following questions:

Does “intelligence” (“real” intelligence, not AI) necessarily involve the capacity to learn?

And if so, what does that imply?

That, if the term “artificial intelligence” is to be at all appropriate, AI has to be capable of learning as well?

Or, that AI has to “artificially learn”—that is, it has to do something artificially that is the analog of what a natural intelligence does naturally when it learns? In that case, it seems to me that talking about AI learning is at least a helpful metaphor, since it’s no more unfair or inappropriate to use the word “learn” than to use the word “intelligence.”

I would say a camera is like a bookshelf. They don’t have knowledge, but only records. But that’s not explicit in my definition; it really does need work.

I would say that it definitely learns, and does so in a way that is much more analogous to a toddler than it is to an Iphone playing music. What makes the AI different from the Iphone is that responding to a given output with an explicit input is programmed into the Iphone directly. In an IA the output an an emergent property of the gestalt of their inputs subtly influencing each other and arriving at correct responses might not explicitly match any of their direct experiences. I would say that a definition of learning that excludes this is overly restrictive.

Where I would draw the line is does it “understand” what its learned. That starts getting into the whole wooly concept of awareness which is not really fully understood by anyone, but I can feel pretty confident is saying that according to all but the most loose understanding of the word (recursion not intended) current AI’s don’t.

Can machines learn? Yes, they can.

https://www.sciencedirect.com/topics/computer-science/machine-learning

It is a long-established term and principle in computer science.

Douglas Hofstadter wrote a book on this. I think it was Fluid Concepts and Creative Analogies. Anyway, that’s my contribution to this thread.

That’s an argument I’ve seen made by people who work with the technology. What the average person thinks of as “artificial intelligence” and what LLMs actually do are quite different, and that it’s unfortunate that the name “AI” has caught on for the idea.

What most people envision is referred to as an AGI: an artificial general intelligence.

I bet those people aren’t game programmers, where “artificial intelligence” has been used as a descriptive term for many years.

ChatGPT and these types of “AI” are at their most basic simply pattern recognition engines. And they match your request with the pattern they have. And if it doesn’t match the pattern, then it “hallucinates” or just makes stuff up. You need to check the cites for anything serious.

It appears that, rather than reading the OP, you just Googled “Machine learning” and threw up the first five links you got. Were I unfamiliar with the term, this would be a meaningful contribution to the thread.

Interesting.
If I was to ask ChatGPT a specific question say produce a 250 word article on American progress in the 21st century in the style of Abraham Lincoln.

And somebody else asked the same question, would they get an identical answer?
If, after a period I asked the same question, would I get an identical answer?
And if not, by what metric is ChatGPT tailoring it’s answers?

As I understand the implementation of what the LLM AIs released to the public to be I wouldn’t really classify them as being able to learn to the extent that we humans do. The main difference that I perceive is that we can create novel categories and models of the world. I think implementing sensors and flexible enough programming so that the models can interact with the world, get feedback on decisions, and generate novel models would move them closer to what we do.

I think the feedback and training of the model is definitely a form of learning.

This is interesting. Of course, an adult can learn how to play a particular song on guitar. I would say an adult can learn how to make a guitar produce a certain noise.

A person can learn how to tell whether the noise they are making on the guitar fits with a song (i.e. learning consonance versus dissonance for a particular musical style). But a typical machine (like a guitar tuner, or guitar tutor computer program), which has the ability to tell if a noise is equivalent in pitch to another, never learned consonance and dissonance. The rules to determine which sensory inputs are “acceptable” are built into the machine for a preset number of scales, not learned.

How LLM AI specifically operates, or whether it is capable of composition, I don’t know. I can, however, opine on what you’ve written.

Having access to lots of sentences and taking note of which words frequently appear together does not necessarily involve building concept clusters around the meaning of those words, but it certainly could. It’s a necessary step in learning how to recognize phrases.

I think most babies learn intonation, then semantics, then grammar, but that doesn’t necessarily mean that’s the only way to learn. An illiterate person could, theoretically, learn something about grammar by mechanically:

  1. Pulling down a library book.
  2. Tallying every word that appears in the book, as well as every combination of words.
  3. Writing down a mandatory, positive, and negative rule for every possible combination of words, provided each rule is only ever written once.
    • B must follow A (mandatory)
    • B may follow A (positive)
    • B must not follow A (negative)
  4. Writing down a numerical “weight” for every rule, equal to the tally of consistent word combinations less the tally of inconsistent word combinations. If a rule already has a weight, the existing number is crossed out and added to the new number.
  5. Repeating steps 1-4 for an arbitrary number of books.
  6. Writing down whether each rule is “valid” or “invalid” based on whether the ratio of the rule’s weight to B’s appearances meets an arbitrary threshold.
  7. Writing down all combinations of valid positive rules that share the same subject (B) or object (A).
  8. Writing down all valid negative rules that share the same subject (B) or object (A).
  9. Writing down a new positive or negative rule for each combination from steps 7 and 8, with a multiple-choice subject or object as appropriate, and a weight as per step 4.
    • B1 or B2 or … Bn may follow A (positive)
    • B may follow A1 or A2 or … An (positive)
    • B1 or B2 or … Bn must not follow A (negative)
    • B must not follow A1 or A2 or … An (negative)

Take the second option from the last bullet point list… eventually, he’ll have a rule that says any one of these singular nouns can follow any one of these singular adjectives, or the reverse in other languages like Spanish. There are better ways to weigh rules for usefulness and more steps to recognize more types of rules. I also ignored things like sentence structure and how to tell where words begin and end.

This illiterate person won’t become literate, and won’t know what the words mean without somehow associating words with sensory experience (sight, sound, equilibrioception [sense of balance], touch, smell, taste, chronoception [sense of time]). There has to be a common frame of reference to successfully communicate. The same goes for a machine.

The rules being learned about the given language are - so far as I can tell - analogous to the rules of consonance and dissonance for a given musical style. You can learn how to tell what fits and what doesn’t without learning meaning, and it’s still learning.

~Max

This sounds right to me (I’m no expert).

Likely not, I asked ChatGPT to write me a story about a flying shoebox, and hit the “regenerate response” button a couple of times. I got 3 stories about flying shoeboxes, all different. Different themes, different reasons the shoebox was flying, all internally consistent, if vapid, stories.

Is a better word “trained”? As in the LLM didn’t learn how to write things, it was trained to write things. It’s doing a great deal more than simply playing a downloaded song, or regurgitating someone else’s text. It’s applying complex contextual rules to the generation of new text, rules that the LLM developed via its analysis of large volumes of text.

It’s different from a speak and spell because it wasn’t programmed with rules on how to write things, it was programmed to create it’s own writing rules without direct human intervention.