AI image generation is getting crazy good

ChatGPT trying to figure out who Fleebledoop the Glentoglian is.

Summary

Generate a photographic image in the style of photos taken in 1870, where three people, dressed in period clothing, are standing next to Fleebledoop the Glentoglian. The setting is a cleared area where trees are being felled. The photo has an aged and worn appearance, as it was taken in 1870. It features time-induced stains and scratches. Significantly reduce the sharpness so that the details are not crisp, and greatly increase the wear of the photo, including small tears, missing corners, and small wormholes caused by insect damage. Add a diagonal cut across the photo, as if it had been torn and later mended. There is no text. 1:1

The original prompt had a vrey alien:

Summary

Generate a photographic image in the style of photos taken in 1870, where three people, dressed in period clothing, are standing next to a classic gray alien that floats serenely one meter above the ground. The setting is a cleared area where trees are being felled. One of the people is holding a glowing cube, as if showing it to the camera

The photo has an aged and worn appearance, as it was taken in 1870. It features time-induced stains and scratches.

Significantly reduce the sharpness so that the details are not crisp, and greatly increase the wear of the photo, including small tears, missing corners, and small wormholes caused by insect damage.

Add a diagonal cut across the photo, as if it had been torn and later mended.

I see Gemini isn’t any better than ChatGPT at properly drawing a mirrored subject. I see that particular error, where parts of the mirrored person are shown outside the frame all the time. I’ve got a whole gallery of amusing mirror screwups going back to the early Dall-E 3 days when it was first added to ChatGPT.

But this aspect is significant in this context, as the girl’s reflection does escape the mirror for a while to pursue a life of her own in the real world. I had no idea it was an error–it just seemed an appropriate illustration for what happens in the story.

Summary

Generate a photographic image in the style of photos taken in 1870, where a range of races of aliens and d-bees (each of which is a hybrid with a random Disney or Pixar princess) are standing in a cleared area where trees are being felled. The photo has an aged and worn appearance, as it was taken in 1870. It features time-induced stains and scratches. Significantly reduce the sharpness so that the details are not crisp, and greatly increase the wear of the photo, including small tears, missing corners, and small wormholes caused by insect damage. Add a diagonal cut across the photo, as if it had been torn and later mended. There is no text.

A falconer (by Gemini).

Imgur

Unsettling how?

I agree the reflected girl partly escaping the mirror frame is odd, although as you say it sorta matches the fantasy aspect of your prompt. To me at least, the child’s face completely avoids uncanny valley or dead face / eyes.


That falconer is good enough to be a real photo. We have clearly passed the point where highly convincing fakes can be made that will stand up to a bunch of scrutiny.

Maybe it’s just me, but after reading the story and seeing the picture where the girl was standing half inside the mirror and half outside, the realism of this unreal situation made me feel uncomfortable for a second. I don’t usually have nightmares, but some of my dreams are weird and this picture resembled something I might see in one of those dreams.

Thanks. Makes sense now. I was just critiquing the rendering qua rendering.

As a semi-aside, when AI image generators first started becoming available to experiment with, image output was always square and resolution was 256x256 or even just 128x128. Now you get a choice of aspect ratios and for these landscape images ChatGPT gives me 1,536x1,024 and Gemini gives a whopping 2,560x1,792: 70 times the number of pixels in the 256x256 pixel images of 3 or 4 years ago. (280 times the 128x128 images.) That’s a very impressive advance just by itself.

Anyway, I decided to do a comparison between the new Gemini renderer and ChatGPT using a heavily-modified version of my most recent prompt, came up with on the fly for this test. The first three are Gemini output. The only serious “Hey, this is AI!” flaw I noticed in these is in the first image the girl’s pinky finger was vaguely defined and merged into the flesh of her other hand. I fixed that manually by cloning and resizing one of her good fingers to cover it.

Another experiment I’m working on currently is to try to get camera angles from directly above. It isn’t easy to get, though, and this is the best Gemini attempt. (There was an earlier one that was fully straight down, but unfortunately the image was badly mangled from items leaking in from an earlier prompt in that session.)

And here is ChatGPT. It is better at getting an angle from above in some of the tries. I want to say that it us not quite as realistic as the Gemini images, but they are both very close to equal.

Having the ponytails was actually an unintentional leftover from the modified prompt, but once I started making test images I kept it for consistentcy. I went back just now and reran images with tweaked prompts and this time got good overhead views from both AIs.

Gemini

ChatGPT

This prompt version has the “overhead view” request. Obviously, you delete that for the normal view.

Summary

A blonde girl (around 14 or 16) with long flowing hair kneeling looking at rabbits in a grassy meadow. She is wearing a yellow sundress with grass stains at the knees and flip-flops. She is holding in both hands a black rabbit. Around her are a number of solid white and solid black rabbits. Make it look like a real photograph taken with an iphone 15. I want the POV to be from high above her looking directly down towards the top of her head, like the view from an airplane, a drone, or a satellite. 16:9

I mentioned bunny girl being adapted from the most recent prompt I was working on before that. Here’s some results from that prompt.

In a Facebook post I saw a drawing about Susan, who collects all rocks, not just the ones that she wants to use as weapons. (A Google Lens search showed it to be an illustration from the 1969 book Let’s Read About Rocks by Rand McNally with non-original text added.)

It needed to be improved on with meteorite-related content. At first I used the original drawing as a reference image with the prompts

Then went text-description only

(ChatGPT can make surprisingly decent images of specific types of meteorites even without reference images.)

I mentioned that there was a straight down bunny girl image that was badly mangled from items leaking in from an earlier prompt, here’s that image now that the error has context to make sense.

(For those who haven’t encountered it, sometimes when you are using one of the chat AIs to create images and move on from one prompt to a new one, the AI will include aspects from the old prompt in the new image. Gemini and Copilot seem to be worse about it than ChatGPT, but all of them make those mistakes. You really have to get into the habit of starting a new chat session for each new prompt.)

I’m not sure how much that leakage is a “mistake” as opposed to the AI being trained on how real people really think.

IOW, I speculate that it fits the typical users’ use cases, even if it doesn’t fit yours or mine. All day long Jane the advertising copywriter is working on the Smithers project and keeps asking for more.

But yeah, the meteoric bunnies are a poor fit for one another.

The cartoon version was closer to what I asked for, but the photo version isn’t bad. I think it still transmits the idea.

Imgur

Imgur

I leaned in harder on telling it what I wanted with the water turning into wine, and it complied pretty well.

ETA second try:

Nice. I like how Jesus comes out a little different each time.

Extra points for him also looking reasonably Semitic. Not the usual bible belt blond blue-eyed jeebus nonsense.

In my first try (in Copilot) I forgot to include the Ferris Wheel and tent (but overall it is one of the best images). I added those for a 4-shot in Sora

Summary

A scene at a small country fair. Jesus is standing under a small striped awning at a booth advertising wine. On the wooden stand in front of him are several water bottles. The bottles on the far right side are filled with red wine. The bottles in the far left side are filled with water. The bottles in the center are filled with a swirling, blending chaotic mix of wine and water. Jesus is staring at the central water bottles in intense concentration, waving one open hand over them as swirls of magic distort the air between his hand and the bottles. In the background are a Ferris Wheel and a big top circus tent. Candid photo with shallow dof and forced perspective taken with an iphone 15.

Pretty cool prompt idea.

It’s just another carnival trick. </unimpressed>

Here’s what Gemini did with the same prompt

I trimmed down the prompt far enough to use it for a video clip in Bing and got crap. Every video I’ve tried for in Bing has been crap. I don’t know why, it uses Sora, and the videos on Sora are much better.

That’s a great pic and dramatic and lively. And impressive as hell.

But that’s Jesus the magician, not Jesus the miracle worker. I’d expect a magic trick to have curling smoke and a gradual transition. And perhaps liquid seemingly appearing out of nowhere.

I’ve never seen a miracle, but somehow I expect less folderol and more results. As in one millisecond the bottles have water and the next they have wine.

Like the difference between a McLaren and a Lucid Sapphire in a drag race. One is full of noise and drama, the other just gets the job done no muss no fuss, just silent aggressive motion and we’re done.

WWE loudmouthed clown vs ninja assassin.