It’s also possible that the program “remembers” more of some images than others, because they have more detail in them. I’m thinking, specifically, of broccoli: One of the published images from one of the AIs was a sequence of “an apple fighting a piece of broccoli”, in photorealistic style. I maintain that it is impossible to create a photorealistic image of a piece of broccoli, without either having a 3D model of the world with a level of detail impossible to achieve with any plausible number of still photographs, or copying exactly an already-existing piece of broccoli. Given that the AI was able to produce photorealistic broccoli, I believe that there must, in fact, have been a picture of that exact piece of broccoli in that exact pose somewhere in its training data, and for some reason (possibly the high amount of fine detail in the picture) it decided it was worth remembering that picture exactly (at the expense of remembering less detail in a bunch of other pictures, to maintain its average).
Doubtful. Broccoli has a highly repetitive and fairly simple structure. There’s no reason to think it can’t generate a new picture based on the things it’s learned–not just about broccoli, but about similar-looking objects like trees.
It probably doesn’t have a 3D model as such. Again, this is one of those things where it probably has picked up some pattern, but it is essentially alien to us, and doesn’t map exactly to how we think of a rigid model rotating in space.
It knows about depth occlusion, at the very least (nearby objects occlude far off ones), and that far-off objects may appear hazy compared to nearby ones.
The AI can be thought of as an extremely advanced image compressor. Early compression tried to replicate images exactly, and did not perform very well. Later, JPEG and MPEG adopted lossy compression, trying to use the characteristics of human vision to throw away differences you wouldn’t notice. But this is many steps beyond that: you’d never notice if every tiny bud on a piece of broccoli moved around, only that they’re colored and arranged sensibly. That allows a far greater compression rate than you’d otherwise get. The compressor needs to understand something about how the buds are colored, sized, and positioned relative to one another, but with enough “rules” it can generate something indistinguishable from being real. Some of these rules can be shared with other images; for instance, a random distribution that keeps a minimum separation, along the lines of Poisson disk sampling. “Random points that never get too close” is a pattern that will show up in many places.
I’m fairly confident that google image search uses AI to compare images. Likely what they do is rescale everything to a given size, calculate the embedding tensor, and then only return N images within some radius in embedding space. Or use some other heuristic to ensure diversity in search results.
You can image-search the data used for training many of these AIs here: https://haveibeentrained.com/
Look at “nuclear explosion broccoli” here (on the second page of examples). Generate your own. Each time, it is different broccoli.
I’m not sure that is different broccoli… Several of those images look like the same slightly-mangled piece of broccoli, differing only in the ways they’re mangled. See especially images 2 and 16, for instance, which look almost identical.
Ha!
If, while training an image synthesis model, the same image is present many times in the dataset, it can result in “overfitting,” which can result in generations of a recognizable interpretation of the original image. For example, the Mona Lisa has been found to have this property in Stable Diffusion. That property allowed researchers to target known-duplicate images in the dataset while looking for memorization, which dramatically amplified their chances of finding a memorized match.