If I try to generate a picture of “a garbage truck shaped like an elephant”, how does it know to place the head of the elephant at the front of the vehicle?
Sometimes pictures are combined more like a collage, but many times they are combined in a seamingly logical way. I understand that two human-like figures can be combined because it somehow finds the eyes/heads etc.
If you asked people to do this, wouldn’t you expect them all to place the front of the animal at the front of the vehicle, unless there were some compelling shape correspondence that dictated otherwise?
An A.I. does it for the same reason that people do it. It’s sensible to put the front end of the animal at the front end of the vehicle.
No, we do not. I don’t mean nobody knows how to write the software, obviously people know how to do that. 1.) Write an AI that learns from patterns. 2.) Feed it a large dataset. But notbody knows why it reaches any particular conclusion. And I don’t mean the mechanics of it–I obviously don’t mean nobody knows what a neural network is–I mean that nobody, not you, not me, not the programmers, knows what output that AI will provide until they test it. Nobody knows the specific visual clues that generate the specific neural net weightings that make an image generating AI reach a specific understanding. That data file is a black box.
Ok, but that means we know more about current AI behavior than we know about the behavior of biological intelligences.
I would prefer to reserve hyperbole like “we have no idea how or why the AI is doing what it’s doing” to the post-singularity scenario where AI gets better at developing AI than humans, so that humans are no longer required and each generation of superintelligent AI programs the next generation.
As a kind of “exception that proves the rule”, here is a study of GPT-2 about discovering a neuron that predicted whether the net would choose “a” vs. “an” as the next word: https://clementneo.com/posts/2023/02/11/we-found-an-neuron
So, after a great deal of research, they found one trivial example of how the net is choosing one word over another. And they still don’t know much about what goes into the decision; they can just identify how it is correlated with the output.
In comparison, figuring out things like how an image generation net knows what the front of a vehicle is, or that an elephant’s head should go on the front, etc. is totally hopeless. It’s not even clear that there is an answer; the decision is likely emergent across the set of weights, and there’s nothing that could be explicitly said to correspond to those decisions.
Okay, I tried a number of images in SD and DE2 using that exact prompt.
Stable Diffusion:
Dall-E 2:
So we see
1.) SD has no ability to generate this type of image at all (and outside these images I tried different combinations of words and arrangements with no better result)
2.) DE2 has a clear idea of the front of an elephant (there are no images of an elephant’s ass attached to a truck) but no idea of where best to attach it to the truck–sometimes it is in the front, but also can be at the side or back.
Not really. We have sometimes post-hoc explainations for why it was useful for a trait to evolve, but there are countless traits that would be useful that haven’t evolved, and vast numbers of variations in what did evolve. Sharks, cuttlefish, copepods, and scallops are all solutions to the problem of “mobile organisms in a marine environment” but none of those specific forms could have been predicted 600 million years ago and none if them are very similar to each other.
My wife is in this field – specifically NLP (natural language processing). From what I’ve understood her talking about this, yes, it gets to the point where we, as humans, don’t really understand the connections learning models are making. I mean, there is some overall sense of it, of course, but when you get into the weeds, it’s hairy.
What does post hoc have to do with anything? We understand the general principles of evolution. Of course any application of those general principles to explain an example of evolution will be post hoc, just as any explanation of a geologic formation is post hoc.
The general principles here are not validated by predicting a future that will take millions of years to unfold. They are validated as predictions about future data - that we will not discover precambrian rabbits.
A significant difference here is the idea of purpose. It’s almost impossible to talk about any biological system without bringing in purpose: lungs evolved for breathing, legs evolved for walking, etc. Those are post hoc explanations for a purposeless process. No one makes the same mistake for geology. Formations are what they are.
Trying to figure out what the neural net is doing is possibly an invitation to the idea of purpose, even when it does not exist.