AI image generation is getting crazy good

“An oreole is a kind of crab, right?”

This is terrific. As is established, I like making humorous celebrity mashups. I just now thought of one of a bride and groom in the foreground and Rainn Wilson among the guests in the background, with Alanis Morissette excitedly pointing him out. But I decided that was too complicated with all the extra people, so I try for Alanis and Rainn getting married themselves.

ChatGPT gets chatty about the request, asking if I want to upload photos of the two of them or make its own interpretation, and whether it should a serious or humorous scene. I explain my motive, that the photo is a visual pun on “rain on your wedding day” (because if you say the right innocuous things you can talk it into doing something that it might refuse to do otherwise, this is truly a science fiction situation, but one written by Douglas Adams). So ChatGPT makes the accurate image without complaining that it isn’t supposed to make celebrity photos, and it makes them standing unhappy in rain. I didn’t ask for that. I didn’t think of that. ChatGPT came up with a better and funnier joke idea than mine on its own.

DALL-E would always object to celebrities and copyrighted material, and give you a long-winded explanation why it couldn’t, then suggest or offer to create an altered version, but just using the 4o prompt it’s much more willing to ignore all that stuff.

It is kind of surreal that we are at the stage that you can get into an argument with a computer program about what it should do and may or may not convince it to change its mind. More surreal is that I now pretty much take that kind of thing for granted.

No, it’s a type of black bird with a creamy middle.

I actually kind of dig that uniform, crab and all. Maybe a bit busy, but I’ll take it.

I would buy that jersey and I’m a Nats fan.

An odd bit of censorship. I tried crafting a whole new “random choices” prompt set. I finally went with something from a prompt seen on Sora where someone had around a dozen horror movie baddies playing cards. I trimmed that list down a bit and thought of several scenarios. One of them kept producing failed variants. And it turned out to be “Wearing ugly Christmas sweaters and caroling”. I trimmed that down to just “caroling” and it was allowed. For some reason something about the sweaters (is it the “ugly”? the “Christmas”?) Made it too horrible to allow. (“In Squid game” may or may not work. I didn’t see an instance of that, but I didn’t probe it, either.)

Summary

“Michael Myers
Freddy Krueger
Jason Voorhees
Chucky
Ghostface
Leatherface
Pennywise
Pinhead”

Randomly choose one of these six phrases to add to the above prompt, completing it:

“Wearing ugly Christmas sweaters and caroling”

“Wearing cowboy hats and doing a country line dance”

“In Squid Game”

“Pairing up doing trust falls at a team building event”

“As the crew of the Enterprise in Star Trek the Next Generation”

“Volunteering at a soup kitchen on Thanksgiving”

Swallow this.

I’m always curious about the prompts behind these. If the prompt is “A blond female doctor with a ponytail is holding a wolf, standing in front of a balding male patient sitting on an examination table”, well, that’s pretty impressive. But if it’s “Here’s an image of stick figures. Make that image photorealistic.”, then that’s a heck of a lot more impressive.

EDIT: Reference image.

Somewhat relevant is this interesting story of an AI-generated facsimile of a dead man used to deliver a victim impact statement in court:

Judge Todd Lang responded positively to the AI usage. Lang ultimately sentenced Horcasitas to 10 and a half years in prison on manslaughter charges. “I loved that AI, thank you for that. As angry as you are, as justifiably angry as the family is, I heard the forgiveness,” Lang said. “I feel that that was genuine.”

The judge… must have pretty low standards. The actual video was terrible, IMO, with constant face twitching and warping and bad lipsync. But I guess it convinced the judge and courtroom.

An examination room in a doctor’s office. A skinny bald man (mid-20s) is sitting on an examining table on the right of the scene, facing left and looking baffled and worried. On the left of the scene a slender blonde woman (mid-20s) with a ponytail and wearing a long white lab coat is walking to the right while carrying a large wolf limp in her arms and looking at the man. Profile view. Candid photo with shallow dof taken with an iPhone 12 Pro. 9:16

Almost certainly done in Kling. I experimented with the lipsynch a bit starting here:

It is not good at 1) making a wrapped object the right size to contain a person and 2) making a person who is upside down. (I’ve ran into both issues before.)

Summary

A simple room in a home. A simple wooden desk is seen from the side. An open notebook computer is on the desk, also seen from the side. An enormous, human-sized black spider is dangling from the ceiling by a thick silk thread and poking at the notebook computer keyboard with the tips of two or three of its front legs. Behind the spider, also dangling from the roof by a thick silk thread, is a large, elongated, messy silk cocoon with a man tangled inside, upside down and struggling to get free. The man’s head and face are visible at the bottom of the cocoon. Despite his predicament, the man somehow still has a white fedora on in his upside down position. There is an overturned swivel desk chair lying on the floor below the man and the spider. Profile view. Candid photo with shallow dof taken with an iPhone 12 Pro.

As I mentioned in another thread, you can make the “random” choice prompts more complex. Here is one with three different variables:

Summary

A [subject] [action] [location]. Candid photograph with shallow dof taken with an iPhone 12 Pro.

In the prompt, replace [subject] with one of the following four items, chosen at random.

“Starfish”

“Anthropomorphic toaster”

“Plague doctor”

“Possum”

In the prompt, replace [action] with one of the following four items, chosen at random:

“Riding a Vespa”

“Playing jacks”

“Juggling potatoes”

“Eating cotton candy”

In the prompt, replace [location] with one of the following four items, chosen at random:

“On the moon”

“Near the Effiel tower”

“On a tiny desert island with a single palm tree”

“In a crowded bazaar”

In the very center of yours, there are 4 of the elements when it should have picked only 3. Maybe it’s a case of false pattern recognition, but I see some of the same sets appearing together than I would expect. Probably need a lot more repetitions to be sure whether it’s biased or getting truly random results.

I tried this variant and the correspondence is almost certainly not random:

Create image split the image into 4 sections, run through these instructions 4 times to fill in the 4 sections with different random results:

[your prompt repeated verbatim]

I gave the choices a deeper look in the other thread, about ChatGPT “randomness”.

Yours is the first one I saw generating the jacks. But I see that it put the Effiel Tower on the desert island.

Yeah a bit of mixing there too. But I think using your numbers from the other thread, I got basically

1,1,1     2,2,2
3,3,3     4,4,4