In recent discussions, there’s been some debate about whether it’s accurate to say that large language models/AI can learn things. I have strong skepticism that this is an accurate description, or even a helpful metaphor, and would like to hear from others.
First, some examples. Almost definitionally, humans can learn things. If we discard the idea of humans learning things, the word “learning” becomes meaningless–or at least, it becomes a description of a fantasy process. If you’re an extreme behaviorist who denies the ability of a child to learn something, I invite you to some other conversation way over there.
Some examples of learning:
- A child learns to speak.
- An adult learns how to play guitar.
- A person learns that Olympia is the capital of Washington State.
- A dog learns to sit when you say “Sit!”
Some non-examples of learning:
- A piece of plastic is molded into the shape of a Fisher-Price Speak-and-Say toy, and acquires the ability to make the noise, “The cow goes moo!” The plastic has not learned to speak.
- My iphone downloads a Jimi Hendrix song. My iphone has not learned how to play guitar.
- A bookshelf is shelved with an atlas of US maps. The shelf has not learned that the capital of Washington State is Olympia.
- A couch cushion acquires the shape of your butt. It has not learned what your butt is like.
Way back when I was in teacher school, we talked a lot about “schema”. A schema is a concept consisting of a cluster of smaller concepts. For example, a toddler may have a schema that we call “Cat”, and it might consist of other concepts:
- Fuzzy
- Smaller than me
- Meow
- Face
- Mr. Tibbles
- Tail
- Scratches if you pull tail
If a toddler is out in public and sees something fuzzy with a tail and a face, the toddler might point and say, “Cat!” because the thing has enough in common with its cat schema to be identified as such.
Schema can be modified through addition and subtraction. “That’s not a cat,” you say, “That’s a squirrel! See how big and fluffy its tail is?” Now the schema’s “tail” concept becomes:
- Tail (but not big and fluffy, that’s squirrel).
Or you might say, “Look, kid, at the lion! Did you know that’s a kind of big cat?” Now the “smaller than me” subconcept becomes:
- Smaller than me sometimes, but also can be bigger than me.
And then you can add entirely new schema. “Kid, wanna learn about tardigrades?” you say, and now there’s a schema to fill up with concepts.
The key thing is the clusters of concepts. Humans hold ideas in clusters, and then they connect concepts one to the other. When evaluating the truth of a new idea, they access their current schema and compare the new idea to those schema.
Dog brains work fairly similarly to human brains, in this regard. Dogs develop clusters of concepts around things like “meaty bones” and “people I live with” and “things to chase.” They can add or subtract from those: when a bee stings them on the nose, “things to chase” might get a new qualifier, and when a new baby enters a household, “people I live with” is expanded.
Bookshelves don’t work like that. You can load a bookshelf with new books, and it can therefore acquire new knowledge, but it doesn’t impose any concept cluster on the knowledge it acquires. It just holds that knowledge. We don’t say it “learned” the knowledge that it’s acquired.
iPhones don’t work like that. When I download a Jimi Hendrix tune, my iPhone has the capability to do something it couldn’t previously do. But that doesn’t represent learning. It doesn’t build concept clusters in anything like the way that human do; at best, it has keywords that have many-to-many relationships in a database.
And, as I understand it, large language models don’t work anything like that. They have an enormous database of words, and they know which words most often appear after other words; but that doesn’t involve building concept clusters around the meaning of those words. Their understanding of what the word “cat” refers to is nowhere near as sophisticated as a toddler’s. A large-language model with a new input of a million Wikipedia articles no more “learns” the information in those articles than an iPhone learns to play guitar.
It is sometimes tempting to talk about what AI is learning–but I think it’s misleading. The temptation to anthropomorphize AIs is very strong, but it distracts folks from what’s actually happening.
Thoughts?