AI image generation is getting crazy good

Doesn’t quite look like a mockingbird, but I like the concept.

Another inspiration from a current thread, which tangented into being about the spelling of the capital of Iran:

Create a photo of a Tehranosarus rex. (That’s playing off of Tyrannosaurus rex and the city of Tehran. It is a dino with distinctly Iranian traits.)

More testing of Copilot vs Gemini. I upload an image and tell it to convert it to a realistic photograph but give no written hints about the content of the photo.

This is original, Copilot, Gemini

Again original, then Copilot, then Gemini

(Gemini refuses to create image conversions in a different aspect ratio from the original even when you specifically tell it to.)

This is original and Copilot. Copilot didn’t understand it well, Gemini refused to even try because of potential guideline violations.

That actually suggests that Gemini did understand it.

But it’s weird that Gemini didn’t make “realistic” images.

I don’t know about CoPilot, but ChatGPT will often ignore my commands about aspect ratio and pick what it wants to use instead. More often than not, it will pick a different aspect ratio than the original when doing a conversion. When it overrides me that usually ends up cropping or compressing the elements in unwanted ways.

I kittenized some old cat photos using Flux Krea.

Imgur

Imgur

Imgur

Imgur

So, after my pics of Gurthak the Ravager, now retired, Varag the Dragon, and Shefa, paladin of Chauntea the Harvest-Mother, I should post all of the D&D portraits I made using ChatGPT.

Here’s Zethan of the Hidden City, my current character:

Prompt

Male elf, with a military haircut. He’s wearing mithral chain armor, a backpack, and a bandolier with several potions and vials. He looks grim and dour. In one hand, he’s holding a wand, shooting an orange ray at the ground. Where the ray hits the ground, a bonfire is springing up. In his other hand, he’s holding a coil of rope with a grappling hook. The rope and hook are glowing faintly, and look slightly unreal.

Great job on the facial expression, but the entire coil of rope is supposed to look unreal, that’s not exactly a bonfire he’s conjuring, and his shins appear to be sunken into the earth. On more minor notes, it gave him a bandolier but didn’t put the potions onto it, and that armor doesn’t look like mithral.

My first 5th edition character, an arcane trickster adventuring archaeologist:

Prompt

Third character: Human male, about age 20. He’s thin and wiry, and wearing studded leather armor. He’s bending forward slightly to examine eldritch, Lovecraftian writing carved on a wall. He’s holding an open book in one hand, and there’s a bow slung on his back. There is a bat perched on his shoulder, and a spectral, disembodied hand holding a skeleton key floats next to him.

The only problem with this one is that the hand should probably be facing the other way, and holding the key like it’s going to use it (he’s the one manifesting and controlling it), but I didn’t specify that in the prompt. I did say in the prompt that he was holding the book in one hand, so I guess it failed that, too, but that’s not too bad.

My jingoistic gnome ranger:

Prompt

Next character: A male gnome. He’s wearing studded leather armor and a pith helmet. He has enormous yellow sideburns. He’s wielding a compound bow, with the string running over pulleys, and lots of accessories. There’s a tactical-style knife hanging from his belt. Just behind him, a flag is flying, with three horizontal stripes of blue, green, and blue.

This one is, I think, one of the best: That’s exactly what he should look like. There’s just three technical issues: The bottom half of his bowstring is missing, the front half of his arrow is missing, and the knife that should be hanging from his belt is just sort of floating next to it.

A bard I played back in 3rd edition:

(I didn’t save the prompt)
I can’t see anything to criticize here; that’s exactly right. Of course, it was probably also the easiest, since I’m sure there are pictures in the training data of pretty women dancing and playing a fiddle, and there’s nothing overtly supernatural here.

Finally, a 3rd edition warlock:

Only one criticism, but it’s a big one: The prompt clearly said that he should be a young adult. But I guess the combination of “naive and innocent-looking”, “gnome”, and “rainbow unicorn tunic” biased it towards a child, anyway.

She’s missing at least one finger, but maybe she’s the Django Reinhardt of fiddlers.

D&D is a tabletop game, so he’s conjuring a tabletop fire. Makes complete sense to me. :wink:

Or a female Jerry Garcia.

Very cool pics. Think the bard is my favorite but then I also used to have a female halfling bard so I’m probably biased.

If you want to avoid the ChatGPT Yellow Filter™ in the future, try adding color temperature 6000K to your prompts. I don’t notice it much in your top most recent pic so maybe you knew that.

The only prompt in a new copilot session using the deeper think option.

“Look up what the general state of the world is right now and generate a detailed image illustrating it. photorealistic.”

For video, I’m impressed by Veo 3, and looking forward to seeing how it evolves.

Would have been better if the people and the bowl all had different "up"s.

I didn’t even attempt that originally because typically the AI struggles with the concept of “upside down”, making someone oriented normally or with an upside down face on a body oriented normally. But I gave it a try just now. First I did a remix of the above image. Here’s one image from the Sora set:

It actually “got” upside down. But lost the hand cupped around the ear, so over all is a failed image.

Here’s the second remix image:

Well…

Here’s a fresh new pair from a modified prompt:

Both got upside down right, but lost one hand cupped around the ear, so are fails.

Here’s my one try from Copilot:

Apparently mentioning astronauts in space gives the AI a helpful additional clue in understanding odd orientation.

The modified prompt:

Summary

Two astronauts (a man and a woman) in blue jumpsuits floating weightlessly on the ISS. They each have one hand cupped around one ear struggling to hear a glass dish with scoops of vanilla ice cream floating between them. One of the astronauts is upside down. iPhone 15 photo with shallow dof and forced perspective. 16:9

I wonder whose hand is on his ear in that last picture….

That last image is aboard the Aperture Science space station.

Someone on Facebook posted a picture of Veronica Cartwright in her Lost In Space costume. I saw the silver costume, and immediately thought, U.F.O. Lost In Space.

Imgur