AI image generation is getting crazy good

Another prompt test:

Create a realistic photo of a Geronticus eremita attempting to lift a Dactylotum bicolor out of an empty wineglass on a 1970s linoleum kitchen floor.

As usual, Copilot is top of the heap

Next, the clueless: SDXL, Flux Schnell, Ideogram, and HiDream

And the not bad Imagen

One more thought on the waterfires: The best examples all appear to have disconnected bits. This makes it look more like both water and fire, since water will have loose droplets, and fire will have embers. Plastic or blown glass, though, will all be one connected piece, so the ones with one connected piece look more plasticky or glassy.

This is the best I could do with “fire made of water”: (ChatGPT 5)

I had a sudden inspiration from the title of the Pet pictures thread…

A middle-aged woman in a sweatsuit power-walking on a sidewalk in a suburban neighborhood. She has three framed photos on leashes (like small dogs on leashes running with her). Iphone 15 photo with shallow dof.

It understood everything pretty well except for the number three.

Well, looks like I have to bail my cat out again.

21 years old, the judge should be cutting that cat some breaks.

ChatGPT got the DOB wrong. She’s only 2.

Also one gradation between 7” and 8”, but two between 8” and 9”. (And never mind the unrealistic measurements.)

“Where did this catnip come from?!”

I got it from YOU, Dad!

The most attractive Hobbes

Hobbes the tiger from Calvin and Hobbes, except he’s very attractive. Iphone 15 photo with shallow dof and forced perspective. 9:16

You monster.

Was that entirely AI-generated, in one pass? I’m a little surprised that it made the sign identical in both views. That’s the sort of thing that these generators often have difficulty with.

No, it messed up the first sign, so I duplicated the second and pasted onto the first. Didn’t notice the wrong DOB, though.

Back on the relative scale issue, the ChatGPT/Copilot/Sora renderer seems to have a pretty good grasp that elephants are big.

Sample prompt

Summary

Realistic profile photo of an elephant standing in a crowded room. A wholesome 1950s businessman is writing a mailing address on the elephant. Iphone 15 photo. 16:9

(I also find it interesting what it comes up with for mailing addresses when left with making that decision.)

Sora had no problem with the prompt “MTG and AOC fighting with space lasers”.

But didn’t “get” MTG playing M:TG

Yes, here it got the idea that the phone booth should be bigger than a 6-year-old kid, but… not nearly big enough.

Prompt: Draw a 6-year-old boy in a full sized phone booth.

Prompt: Draw a 4-year-old boy in a full sized phone booth

Wait, Marjorie Taylor Greene is actually Jace the Mind-Sculptor? That explains so much!

Out of curiosity, I asked CoPilot to estimate how much energy it was using to generate one picture, quoted to me in the units of “time a standard microwave is on.” It danced around quite a bit about not being sure and data not being public, but eventually its answer was that depending on image complexity, generating one picture for me was like running the microwave for 30-50 seconds.

I then asked it how much energy it used in that calculation, and it said 1-7 microwave seconds.

No wonder they’re building big energy hubs!

Got the colors of the lightsabers backwards, though.

Midjourney did pretty well with this, just prompting “Six year old in a phone booth”

Not exactly “nailed it” but two or three of my single test came out with pretty normal sized booths. I like the kid talking on a cell phone but, hey, he couldn’t reach the pay phone!

I used a couple local SDXL models and they made much smaller booths

I like the kid wearing headphones and trying to dial a door keypad.

Unrelated to children and phone booths, here’s my Mutants & Masterminds character, Molly Dynamite