AI image generation is getting crazy good

Imgur treats even a single image upload as an “album”. You linked to the album. To get the correct aspect ratio you need to link to the individual jpeg.

Imgur

I think it converted some leaves in the original into toy mice in the new one… but yeah, it’s a good job.

Those are difficult to interpret even for a human. It appears to use some sort of coiled wire to make (unpumpkinlike) tendrils, and possibly cloth leaves?

I also had Sora do a pair, and got these

Best homemade snake evah (not the exact prompt used).

The model still has problems with snakes, tending to make closed loops

Or worse

There’s this barber shops near my house, the owner uses AI to make pics that she posts to her facebook page, with mixed results. This is an actual post of hers.

I thought to try making a fire made out of water. I wondered if Copilot would struggle with the concept but it understood right off the bat.

I then tried for candles

And putting out the water with fire

Sample prompt

Summary

Realistic photo of a campfire made out of clear, colorless water. (The “flames” are flame-shaped, but are water.) A girlscout is dousing it with a bucket full of fire. It is day, not night. Smokey the bear is standing to the side with his arms crossed, nodding in satisfaction. Iphone 15 photo with shallow dof and forced perspective.

The same prompt in SDXL, Flux Schnell, HiDream, and Ideogram, which didn’t get it

And Imagin (via Gemini) which mostly got the concept of firewater but for some reason made firefire as clumps of cloth or something.

Notice how Smokey is always sized similarly to the presumably short girl scouts? That’s one of my pet peeves about all of the image generators I’ve tried; it’s hard to make them show realistic contrasting sizes between characters who should be significantly different heights.

That might explain the issue with one of the D&D portraits I made. It should have been a dwarf riding an ox, but it made both of them about the same size, to ludicrous effect.

And shouldn’t that be Smokey’s brother Drippy Bear? Only you can prevent forest waters.

These were from a “reverse Pinocchio” idea I was working with, where a real boy exists in a world where everything else is made of wood.

Imgur

Imgur

Imgur

I tried the same prompt, but added that Smokey is twice as tall as the Girl Scout. It worked, this time.

A lot of cool images, still some struggle with hands.

I mean, this image made Smokey well larger than the girl scout so I don’t see what the problem is.

Relative sizes has always been a weak point for these generators in my experience. If you have access to ControlNet, that would be the easiest way to handle it or else img2img to use as a guide. From a sizes standpoint, Midjourney didn’t do great making a dwarf riding an ox…

SDXL (GonzalomoDMD) got it pretty good but the quality leaves something to be desired…

SDXL (SplashedJourney) looks better from a classic art perspective but the sizes aren’t as good. If I was in the market for such an image, I’d try to get it out of Gonzalomo then run it through SplashedJourney (or Midjourney) as an img2img for better art quality.

There’s all sorts of issues with those samples; I was literally just throwing in a one or two line prompt and grabbing the first results for sake of example.

Edit: Late entry using a custom SDXL model merge looked pretty good as a start. Wants to put horns on my dwarf but that’s what inpainting is for.

Couple more dwarfs with a little more effort

Edit: Playing with the prompt and changing “ox” to “yak” didn’t really change the beast but made my dwarfs look sort of Mongolian.

All of those are better than this:


Which is otherwise pretty close to what I wanted. The ox shouldn’t be quite so spectral-looking, but it is celestial, and it apparently doesn’t know what a “sheaf” of wheat is, but those are minor issues. The relative sizes, though…

Prompt

Second character: A male knight in shining full plate armor. He’s clean-shaven, but otherwise appears to be a dwarf. He looks trustworthy, dependable, and honorable. He’s wielding a flaming sword and a shield. His shield and armor are liberally adorned with symbols of a sheaf of wheat and a rose, and he’s also wearing a silver sheaf of wheat on a thin chain necklace. Instead of a horse, he’s riding a faintly-glowing ox wearing barding.

Those are real interesting. Thank yuo. But …

I’m struck by how much the flame-of-water examples look like either ice or blown glass. It looks inappropriately rigid, while the still images of fire somehow seem to leave a sensation of motion or at least of insubstantiality.

I’m wondering how much of that is in the image versus in my perception of the image? And how much is the legit difference between depicting a gas versus a liquid?

They do look kind of glassy/icy. But on the other hand, how would water behaving like fire actually look? Not exactly like that, definitely, but it is hard to get a mental picture of. (The first image with just the campwater is the most dynamic.)

I think the house fire is the most fiery of them but it has the advantage of being further away rather than staring right into it.

My thought was that they looked like plastic Lego fire pieces, but same idea.

Then again, I sure couldn’t draw waterflames any better. That’s one of those prompts where you judge the AI on the fact that it was able to do it at all, not how well it did it.

The OpenAI LLM based bots (Copilot, ChatGPT) are excellent for prompt understanding due to their LLM nature as compared to single-nature image models. Unfortunately, they tend to look very stereotypically “AI” to me though that could just be a result of people not trying to go beyond that and a touch of toupee fallacy.