Another AI images question

Danger_Man · February 17, 2023, 4:59pm

If I try to generate a picture of “a garbage truck shaped like an elephant”, how does it know to place the head of the elephant at the front of the vehicle?

Sometimes pictures are combined more like a collage, but many times they are combined in a seamingly logical way. I understand that two human-like figures can be combined because it somehow finds the eyes/heads etc.

Darren_Garrison · February 17, 2023, 5:11pm

If it manages to create that image, it is because it taught itself about the “frontness” of both a truck and an elephant.

Riemann · February 17, 2023, 5:19pm

I’m not sure what you’re asking.

If you asked people to do this, wouldn’t you expect them all to place the front of the animal at the front of the vehicle, unless there were some compelling shape correspondence that dictated otherwise?

An A.I. does it for the same reason that people do it. It’s sensible to put the front end of the animal at the front end of the vehicle.

Danger_Man · February 17, 2023, 6:11pm

Maybe my question is how it knows it is the front

Riemann · February 17, 2023, 6:53pm

We have AI that can drive cars, and you’re surprised that an AI knows which end of a vehicle* or an elephant** is the front?

** the end with the trunk
* the other end

Darren_Garrison · February 17, 2023, 7:01pm

Nobody knows how or why the AIs reach the conclusions that they have.

Johnny_Bravo · February 17, 2023, 7:07pm

For giggles, I went ahead and entered the prompt into DALL-E 2 just as the OP wrote it.

I quite like the second one.

Riemann · February 17, 2023, 7:17pm

Sure we do. Here’s the original 2014 paper by the Google researchers that developed Google’s image captioning AI.

Darren_Garrison · February 17, 2023, 8:58pm

No, we do not. I don’t mean nobody knows how to write the software, obviously people know how to do that. 1.) Write an AI that learns from patterns. 2.) Feed it a large dataset. But notbody knows why it reaches any particular conclusion. And I don’t mean the mechanics of it–I obviously don’t mean nobody knows what a neural network is–I mean that nobody, not you, not me, not the programmers, knows what output that AI will provide until they test it. Nobody knows the specific visual clues that generate the specific neural net weightings that make an image generating AI reach a specific understanding. That data file is a black box.

Riemann · February 17, 2023, 9:27pm

Ok, but that means we know more about current AI behavior than we know about the behavior of biological intelligences.

I would prefer to reserve hyperbole like “we have no idea how or why the AI is doing what it’s doing” to the post-singularity scenario where AI gets better at developing AI than humans, so that humans are no longer required and each generation of superintelligent AI programs the next generation.

Dr.Strangelove · February 17, 2023, 9:46pm

As a kind of “exception that proves the rule”, here is a study of GPT-2 about discovering a neuron that predicted whether the net would choose “a” vs. “an” as the next word:
https://clementneo.com/posts/2023/02/11/we-found-an-neuron

So, after a great deal of research, they found one trivial example of how the net is choosing one word over another. And they still don’t know much about what goes into the decision; they can just identify how it is correlated with the output.

In comparison, figuring out things like how an image generation net knows what the front of a vehicle is, or that an elephant’s head should go on the front, etc. is totally hopeless. It’s not even clear that there is an answer; the decision is likely emergent across the set of weights, and there’s nothing that could be explicitly said to correspond to those decisions.

Darren_Garrison · February 17, 2023, 10:03pm

Okay, I tried a number of images in SD and DE2 using that exact prompt.

Stable Diffusion:

Dall-E 2:

So we see

1.) SD has no ability to generate this type of image at all (and outside these images I tried different combinations of words and arrangements with no better result)

2.) DE2 has a clear idea of the front of an elephant (there are no images of an elephant’s ass attached to a truck) but no idea of where best to attach it to the truck–sometimes it is in the front, but also can be at the side or back.

Darren_Garrison · February 17, 2023, 10:13pm

BTW, this is the most elephanty garbage truck I could get with any prompt in SD:

And I quite like this image I got when I weighted the prompt too far into “elephant”.

Danger_Man · February 17, 2023, 10:50pm

Midjourney got me this

Danger_Man · February 17, 2023, 10:54pm

I have trouble accepting this. Evolution is very complex, but we still are able to identify lots of reasons for why things turn out like they do.

Darren_Garrison · February 17, 2023, 11:41pm

Not really. We have sometimes post-hoc explainations for why it was useful for a trait to evolve, but there are countless traits that would be useful that haven’t evolved, and vast numbers of variations in what did evolve. Sharks, cuttlefish, copepods, and scallops are all solutions to the problem of “mobile organisms in a marine environment” but none of those specific forms could have been predicted 600 million years ago and none if them are very similar to each other.

Darren_Garrison · February 17, 2023, 11:44pm

This is from 2017, but is equally valid today. Or much more valid, given the vast rise in complexity in the past five years.

pulykamell · February 18, 2023, 12:49am

My wife is in this field – specifically NLP (natural language processing). From what I’ve understood her talking about this, yes, it gets to the point where we, as humans, don’t really understand the connections learning models are making. I mean, there is some overall sense of it, of course, but when you get into the weeds, it’s hairy.

Riemann · February 18, 2023, 12:59am

What does post hoc have to do with anything? We understand the general principles of evolution. Of course any application of those general principles to explain an example of evolution will be post hoc, just as any explanation of a geologic formation is post hoc.

The general principles here are not validated by predicting a future that will take millions of years to unfold. They are validated as predictions about future data - that we will not discover precambrian rabbits.

Dr.Strangelove · February 18, 2023, 1:06am

A significant difference here is the idea of purpose. It’s almost impossible to talk about any biological system without bringing in purpose: lungs evolved for breathing, legs evolved for walking, etc. Those are post hoc explanations for a purposeless process. No one makes the same mistake for geology. Formations are what they are.

Trying to figure out what the neural net is doing is possibly an invitation to the idea of purpose, even when it does not exist.

Topic		Replies	Views
Source images for AI images Factual Questions	26	906	February 14, 2023
AI-generated images and artistic endeavor Cafe Society	33	880	September 5, 2022
DALL-E 2 random text on graphics Factual Questions ai	31	1492	March 20, 2023
AI image generation is getting crazy good Miscellaneous and Personal Stuff I Must Share ai	1181	7979	August 5, 2025
I'm missing something about AI training Cafe Society ai	330	1504	July 5, 2025

Another AI images question

Related topics