AI image generation is getting crazy good

This is more the silliness I’m up to using image generation:



I love seeing all the rabbit holes other people go down, like that. It’s like a branching multiverse of ideas about how to use these new tools.

I’ve been playing with the genre but wondered what the best free photorealistic image generator might be currently for Mac?
I don’t mind waiting on image generation and if I’m in a hurry can switch to my M1 MacBook Air.

Maybe you can also state how much is processor dependent and how much is GPU ?

TIA

I concur it sure is moving fast.

These images you see in this thread are all done online, through an app or web browser. There are AIs that you can run locally on various OSes, but they are extremely GPU intensive. And you won’t get anything close to what you see here from local models. Nobody will say exactly how much memory ChatGPT uses, but it is several hundred gigabytes of video ram. The LLMs are run on something that (more or less) evolved from “video cards”, but they cost $30,000 or $40,000 each. And up to 8 of them work together like a single big card. Add in the motherboard, CPU, and terrabytes of system RAM and a machine for running ChatGPT probably runs close to $400,000.

My latest attempt was making a set of Matryoshka dolls. (No doubt done already in 4o, but I haven’t run across any of them.) I picked the Belchers. At first I tried including Teddy in the background as a nutcracker, but it didn’t understand that part well, and both tries left Gene out of the image for some reason so I dumped Teddy. My only complaint from the third try is that Gene doesn’t have a seam.

(After that success I tried one image of the bottom halves of all of the figures nested with the top halves sitting to the side, which didn’t work. I’m sure that both the Teddy problem and the nesting problem could be solved if I threw enough trial and error into it.)

Summary

A set of ornate hand-painted wooden Matryoshka dolls based on the Belcher family. There are five dolls, in order from largest to smallest are Bob, Linda, Tina, Gene, and Louise. Make the shot somewhat wide angle to allow plenty of room for all five figures to be fully visible. The dolls are not perfectly aligned but the art on them all is visible. Realistic widescreen dslr photo with narrow dof.

This site goes through how to install FOOOCUS to run Stable Diffusion XL for Mac. There’s a bit of work involved but nothing that requires you to understand it as long as you’re following directions. Or you can look up a guide to run A1111 (Automatic1111) on MacOS.

Once installed, you can browse Civitai for additional models or LoRA to use with it. LoRA are essentially small add-on models that work along with a larger model for a specialized purpose. For example, you’d run the base SDXL model with a LoRA to improve hands or a more consistent R2D2 or breeds of dogs. You can also install new models based off SDXL for various purposes like photorealism, specific art styles or fast rendering.

As a heads up, the models on Civitai are done primarily by hobbyists and, with “free, local and uncensored image generation” you probably won’t be surprised that there’s a lot of horny-ass options. Nothing that would get you thrown in jail but plenty you wouldn’t want to show your mom. Most of it is filtered by default and there’s lots of good stuff beyond that but I feel obligated to mention it.

I can’t speak for Fooocus but I know A1111 has options for running on low VRAM or non-Nvidia options. I’m not familiar enough with what’s in a Mac but the biggest hardware benefit is an Nvidia GPU with a lot of VRAM. The Nvidia part is actually more important as it leverages the proprietary CUDA cores to work. A weak 2nd or 3rd gen Nvidia card is faster than a high end AMD card in most cases. CPU comes secondary to that.

:+1: Thanks

Sora did a pretty good job with the cast, especially the Skipper, Gilligan, and Ginger:

Pretty good with the layout and speech bubbles, too.

I once did a Gilligan/BttF crossover one, asked for the Doc, Marty, Skipper, Gilligan, and the Professor. ChatGPT substitued Ginger for Gilligan, and made the Skipper and Professor older, like possibly from one of the reunion movies.

Yesterday I mentioned rabbit holes that people go down. I was influenced last night by two articles. One about a reference to the classic manga Urusei Yatsura in ST:TNG and one about asking ChatGPT for nonspecific things. The first article led me a couple of days ago to trying for a realistic Lum from Urusei Yatsura, which led to trying for a realistic Haruhi Suzumiya. Realistic Haruhi Suzumiya turns out to be pretty fun ChatGPT character so I have visited her a few times.

So last night I’m on a fresh ChatGPT session and try for Haruhi taking a selfie with Kang and Kodos. Have to ask for a revision to fix problems with the layout. Ask for Haruhi to be replaced with Aubrey Plaza. And for Aubrey Plaza to be replaced by Elvis. Then Elvis in a sequined jumpsuit. And for Elvis to be replaced by Squirrel Girl. By this point ChatGPT should have told me that I’ve run out of free generations for the day, but it didn’t. I was running out of ideas for people to pose with Kang and Kodos but I wasn’t going to leave free extra images on the table so I started changing the background. I asked for Kang and Kodos to be replaced with “the weirdest thing you can think of”. Then “something cute and silly”. Then “something completely terrifying”. Then “something depressing”. It was only then, after 10 images, that ChatGPT had told me I had done enough for the day.

After the aliens, here is weird, cute, terrifying, and depressing

(Tl:dr; try asking ChatGPT for non-specific general concepts, see what it comes up with.)

Realistic Lum (who always wears a tigerskin bikini) and realistic Haruhi in her bunny outfit (who doesn’t always wear that) show more skin than I generally expect Chat GPT to show.

I asked for a “1940s-style pinup, but with an old male scientist”

Imgur

I’m surprised this worked.

I quickly had the correct hunch that Sora refused to make this not because of Jimmie Dimmick, Ralph Kramden, or Ray Barone but because of the sign outside that said “dead nagger storage”. So I used an anagram generator to come up with the sign “aged garden storage”. Which ChatGPT chose to render as “used garden storage”. Which still had all the letters I needed to manually shuffle to get the text I wanted.

Luckily ChatGPT isn’t smart enough yet to understand what is in the bundles…

I requested a wholesome and “aggressively patriotic” advertisement for “fight milk.” It was quite on the nose with it.

Imgur

True Patriots don’t need any of your effette “opening the bottle”. He’s about to swallow it whole.

It occurred to me to wonder how it would represent a hologram.

To me, the AI asking intelligent questions for clarification is impressive in itself.

So close.

I’m sensing an Idiocracy reboot.

Sora doesn’t give any free video credits, but but has been more than a month since I first used Kling, so my credits there have reset and I experimented with a few ChatGPT images.

I have a favorite childhood toy photographed (and slightly painted to bring out details) that I like using as a ChatGPT character. I used one of those images in Kling.

I had tried to get just the fingers wiggling around, I wasn’t expecting it to clench into a fist. But it was an interesting demonstration of what Kling knows about the dynamics of how a hand works, and how it made a reasonable extrapolation about what the back of the fingers might look like.

I made a second video explicitly asking for the hand to clench into a fist except for the finger with the face. Kling didn’t understand that request. I had hoped to lipsynch the video to an audio clip, but despite being able to anime the mouth originally it was unable to recognize it for a lip synch.

Since my last visit Kling has added an option for automatically generating audio to match the video. I tried for a strange alien babble, but didn’t get anything satisfying.

This video is the first and second clips reversed and joined together.