AI image generation is getting crazy good

Having played around with it today, Grok is now definitely broken.

Yeah, it’s pretty awful. I’ve cancelled my supergrok subscription and told them exactly why in feedback form.

I’ve been seeing Elf on a shelf parody rhymes again lately so I’ve been revisiting some of my 2023 Dall-E 3 prompts in Copilot/Sora. An interesting inconsistency. Sora refuses to do a coot on Groot. But it did Strange on a range. Except it went with locations, not the appliance.

So I tried other phrases:

“Strange sitting on a kitchen range” gets refused.
“Strange on a kitchen range” gets refused.
“Strange sitting on a stove” gets refused.
“Strange on a stove” works, but he isn’t sitting on it.

Meanwhile, “Strange sitting on a stove” gave me this in Copilot:

(2003 Dall-E 3 versions):

Hm, when I saw those images, I was thinking “Doc with a Glock”.

I got the old Grok Imagine video model back today. I can tell because of the choices it makes in how to dress characters (when they’re not already defined in a starting image). That was one of the striking differences between the models.

I mainly just animate pre-existing images instead of animating from prompts, but with a few experiments I seem to be getting better results now.

Has yours reverted? Have you tried re-using the exact prompts you’ve tried with both the old and new models?

I haven’t doneuch. But it seems to be relying on zooming less often than before. Still not great, though.

I had discovered the new model was better (in some ways) than the old one when doing a video from scratch. I accidentally pasted a prompt meant for a 15-second Sora video into Grok Imagine instead. And the new Imagine managed to do things… not great, but still managed to compress the idea down into 6 seconds somewhat coherently. When I’d tried that in the old Grok Imagine, the result was usually mishmash.

And then today, when going back to one of those prompts with no starting image that I’d refined to do better in 6 seconds, I was like “what’s this?? it’s terrible!” and that’s how I initially noticed the model had reverted for me. Then I went to some of the other things in my favorites where I’d noticed the stark differences in its clothing choices to confirm it was back.

They really need an explicit model chooser in the video drop-downs like they have over on the text generation side.

Last night when I was doing rhyme images in Sora/ChatGPT, Squirrel Girl on a pearl got what I planned

And not what I planned.

Both of those Grok animations of the images turned out okay.

That’s not a pearl, though: It’s a rubber ball. It compresses when she sits on it, and dimples from her fingers.

Disney has invested in OpenAI, and is licensing its characters for Sora. That should go…well?

Disney to invest $1 billion in OpenAI, license characters for Sora video tool | Reuters

Perhaps they’re hoping for a very inexpensive remake of Snow White and the Seven-Fingered Dwarves? :zany_face:

But you would have to go with Grok to make Seven Dwarves and the Fingered Snow White.

Somehow I had gotten to my advanced age without ever considering that aspect of the eight housemates’ personal lives.

My inner child is much discomfited. My inner perv is curious. Thanks … I think.

I visited Bing for the first time in a long time today. I notice that thwy have 3 image models to choose from now, Dall-E 3, ChatGPT, and one called MAI-Image-1 that I didn’t recognize (and that came out from Microsoft in October). I have just began to test it. This is one of my .ost recent prompts to Copilot, and the result:

An anthropomorphic lizard struggling his way through driving snow in a blizzard. Iphone 15 photo with forced perspective. 9:16

I trimmed the prompt for the new model (which doesn’t seem to play well with the style description)

An anthropomorphic lizard struggling his way through driving snow in a blizzard.

While I was at it I tried that prompt in Bing video

Then Sora

and Grok

What’s the current workaround for getting images of celebrities? ChatGPT told me that both

and

didn’t follow their content policy.

Sometimes I can sneak things past its 3rd party character/real person censor by doing such a specific text description of the type of person I want, it pattern matches on that celebrity or character and produces them anyway. For this image, perhaps try finding the right image of Loki in the outfit you want, and feeding that into ChatGPT. Ask it for a highly detailed text description of the person and outfit, good enough for a police sketch artist. It might deliver text you can use in a different conversation to get the right image you want.

In other news, OpenAI just pushed a new image generator out to all ChatGPT users, so time to try all the different things to see how this one differs. Reading that announcement, it seems like maybe finally they’re bringing back precise inpainting, which stopped working correctly when image generation got directly inserted into ChatGPT, replacing the external calls to Dall-E 3.

In general, use Sora (in the old interface). It is much more permissive than ChatGPT or Copilot or (I haven’t checked yet but have no doubt) Bing. But there are still specific celebrities and characters who are specifically restricted even in Sora. Loki seems to be one of them. Sora wouldn’t do your prompt, but the restriction is the character, not the actor. Here is " A person who resembles Tom Hiddleston as a mad scientist, on a cooking show. The character is using a chef’s knife to chop a fire into small pieces", where he is on the show “Coook”.

Sora will do some Marvel and DC characters but not others. It has no problem with Squirrel Girl, but not Spider-Gwen, for example. (And does a good job with realistic Squirrel Girls even though there has never been an official live-action version. I suspect it is drawing from images of cosplayers.)

(ETA ChatGPT doesn’t mind making Tom Hiddleston as not Loki, either.)

OK, based on that, I tried “Tom Hiddleston, wearing a helmet with long, curving horns, and a green cloak”, and it rejected that, too. The guardrails are getting more intelligent.