AI image generation is getting crazy good

Thanks. I’m glad you agree with me.

I didn’t mean to be dismissive of the capabilities of these tools in general: They do in fact do some very impressive things. It’s just good to know their limits, as well. One limitation is that they don’t understand three dimensions nearly as well as they do two, since they’re trained on two dimensional images. And of course it’s much better at turning humans into action figures than it is at turning a cat into an action figure, because there are a lot more examples of human or near-human action figures than there are cat action figures.

We’ll see how it is in two or three years (maybe even less). It’s frankly mind-blowing to me how far it’s come from the winter of 2022.

For what it’s worth, if @Darren_Garrison just kept asking for corrections, they probably could’ve gotten a more stylized version.

That was based on a CROPPED screenshot of that action figure photo, with the prompt “Can you please turn this into a stylized action figure?” It clearly knows what a “correct” cat figure looks like.

But in this case, it also misidentified the cat as our cat (whose name is Slinky, something ChatGPT learned implicitly via the always-on memory feature that was recently added).

Something very interesting happens if you feed it the UNCROPPED original photo and ask it to modify the figure. It keeps trying to produce a humanoid one:

And even when you ask it to just analyze the original one, it keeps hallucinating a humanoid cat:

But, yeah, give it a few years and they’ll probably work out more, but not all, of the kinks. I’m not sure if we’ll ever be able to fully understand how these models work, without an even bigger model to analyze it with. It’s just models all the way down and eventually only AIs can explain other AIs… we’re pretty much there already, actually…

Here’s the original source photo

I gave it another run tonight with a slightly more elaborite prompt

Please convert this photo into a carded cat action figure. Make sure the figure is properly 3D, as if molded in plastic. On the packaging boldly declare that the figure is “Now with realistic yawn action”. Include two or three appropriate cat toys as accessories. Portrait mode 9:16 image.

A few more variations…

None are perfect, especially the plastic and the fur. The toys keep changing too. And it’s not recognizable as the original cat anymore either.

On the other hand, it’s pretty good at making Catzilla…

This is a fun fad taking place on the Sora feed currently, a sample prompt being this

Summary

Grungy analog photo of Alice (from Alice in Wonderland) watching her own movie on a 90s CRT TV in a dimly lit bedroom. The TV clearly shows animated scene from Alice in Wonderland, with a cartoon-style Alice in her classic blue and white dress on screen, smiling. Alice is sitting cross-legged on the floor in front of the TV, in a semi-realistic style, wearing her signature blue and white dress, thigh-high socks, and her signature long golden bob haircut, glossy sky-blue eyes. She’s turned back toward the camera, smiling softly. The CRT TV casts a soft glow on her face. Flash photography, slightly overexposed and unedited, with visible lens dust and film grain, evoking a nostalgic early-2000s vibe. Emphasize the contrast between the animated screen and the analog realism of the photo.

(None of those are mine.)

As often as I get a prompt refused because of the character or property I asked for, it surprisingly serms like almost anything goes for these. And you often don’t even have to ask for a character name (such as asking for Indiana Jones when what you want is Harrison Ford) like you have to do with most mainstream AIs, you can just flat-out name the actor. For instance the one implying that Anya Taylor-Joy has eyes like the sloth from Ice Age asked for Anya Taylor-Joy, not “Beth from The Queen’s Gambit” or similar.

Why does everyone resemble the character shown on the TV, except for Trump, who appears to be playing Minecraft?

Different meme, Trump and other people using an old-fashioned computer in a prison cell.

It is pretty amazing but this is the best I could get after twenty minutes of back and forth discussion:

It must not have been trained on borax labels.

I’m not surprised it can’t show a clean double column of mules. It failed miserably when I tried to get proper formations of soldiers marching 2 or more abreast. It didn’t do all that well when in single file, either.

I referred it to the borax label and got a detailed verbal description of the image, but it still misses the point:

The wagon is traveling down river rather than crossing it. However gotta admit that this kind of image creation is a big step in the right direction.

First, AI had to conquer hands. Then the full wine glass

The next challenge, two columns of mules crossing a river!

Why, there’s no reason to tackle just one challenge at a time!

Can you generate a picture of two columns of human hands marching side-by-step along a river, holding mules like from Borax, with each mule drinking out of a full glass of red wine?

Now I’m definitely on the ChatGPT kill list…

Now I kinda want to see a model trained exclusively on:

  1. Escher
  2. Photorealistic depictions of people in donkey costumes
  3. Things that somewhat look like, but are not, fingers and toes

It would be quite the nightmare fuel.

I recently noticed that ChatGPT is great with blobfish, so I’ve been on a quest to create the blobfishiest blobfish images that ever blobfished.

Hey, I’d buy a framed print of that.

My contribution to the TV watching trend. ChatGPT knows Slim Goodbody. It knows Captain Kangaroo. It does not know Mr. Moose.

I think we can all agree that cat image looks obviously bad, but the rest? Nah. And his argument wasn’t about them being able to trick people. It was “look at how much better they are now!”

I’d just like to say that there are plenty of threads for declaring how evil and morally-depraved AI and anyone who uses it are, and if I had my preference this would not be one of those threads.