AI image generation is getting crazy good

Darren_Garrison · April 28, 2026, 6:03pm

The older tools (including Stable Diffusion and even older) were talked about extensively in this thread that started in March 2023, and now seems like it was started in 1023.

Maserschmidt · April 28, 2026, 6:08pm

“Multimodal LLM” is one that combines multiple forms, e.g. video and text.

Ponderoid · April 28, 2026, 6:55pm

ChatGPT (and probably CoPilot) can’t do that anymore. Since they ditched Dall-E and moved the generator into the LLM during the 4o period, the translation to an image prompt happens in a way that is not English. I’ve tried asking it, and it tells me it’s unable to write out the prompt it used. I’ve tried asking it to write out a text prompt for that image, but more often than not, it’s not usable for getting anything even close to what that first image looked like.

Jophiel · April 28, 2026, 9:30pm

I’ve been using Google/Gemini Flow for a couple of days and like it over the “Talk to a chatbot” method for just going in and playing with making images for a while: Flow

It’s the same underlying tech, I just find the UI and experience to be closer to using Midjourney or Night Cafe or other clients actually made for image gen versus asking the Happy LLM Chatbot to make a picture. Since it’s underpinned by the same LLM tech, it does really well with following prompts for concept but I find it frustratingly stuck in the same obviously AI image output that you get from Grok, ChatGPT and Gemini. You can detail art styles, mediums, “style of [artist]”, etc and it still gives stuff that feels like the generic cartoonish faces everyone associates with AI. Back in Midjourney’s earlier days, it had a way of making women that people called Midge-Face (MJ, Midge, get it?) but eventually got trained past it. This feels like a throwback to those days just in overall visuals and vibe. However, it DOES definitely excel over Midjourney in ability to follow a prompt in terms of actual objects, placement, scenes and text. So it’s fun to dork around with and I could see me using it for making images to feed into img2img locally and work at changing the styling.

As noted, LLMs compress and change up your prompts to try to match what the image portion is expecting in terms of tokens. Writing a 1000 word prompt is kind of pointless since it’ll strip out anything it feels isn’t needed and reformat your poetry into prompt talk.

Civit.ai recently (and finally!) split off its NSFW models and LoRA into their own sister site so I can actually tell people to check out Civit without appending a warning that it’s not ALL large-breasted anime furry porn models but that’s what you’ll be seeing when you start looking. I really need to convince myself to take the time to install and use some new models (Flux, Wan, etc) with a new client program instead of messing with SDXL and A1111 like a caveman but each time I try, I hit a learning curve and think “I could just be making stuff instead of trying to figure out how to make stuff” and go back. I have a number of self-trained LoRA and many times that in downloaded LoRA so it’s a worn comfortable blanket at this point.

I was running into that with Flow today. I’d get some image in Generic AI Style, upload a sample image to Gemini, ask “How do I convince you to make this?”, get a couple sample prompts and all would fail to get especially close. I’m sure it’s possible and part of it is learning curve but it sure does feel like swimming upstream.

SenorBeef · April 28, 2026, 10:46pm

Ah yes, I looked at both civitai and civitai red – and when you get to red, the first 6 prompts that fill up your feed look like art, and you’re luck huh, this is the nsfw version? and then you scroll down and it’s like 689 NSFW images and one guy animating a sailboat. I think they deliberately stuck those 6 non-nsfw images up top just to make the site look more respectable, like it’s not all about anime boobs and furries.

There’s not that much you can do with a self run model you can’t do with one of the frontier models except moderated content and if you want to really capture a specific look by deep diving into tweaks and LoRAs and really customizing the hell out of it. I may do the latter, but it’s intimidating. There’s a lot to learn.

I was surprised I was able to generate 11 second videos on my 4080 super 16GB - 11 seconds doesn’t sound that impressive in absolute terms but it is. Processing and memory requirements go up non-linearly with total frames because of the way it has to coordinate the frames with each other.

24 frames = 110s = 4.58 sec/frame
48 frames = 215s = 4.48 sec/frame
72 frames = 361s = 5.01 sec/frame
103 frames = 612s = 5.94 sec/frame
137 frames = 916s = 6.68 sec/frame
161 frames = 1194s = 7.42 sec/frame
201 frames = 1687s = 8.39 sec/frame

I forgot to record the numbers for the 11 second but I think it was about 10 seconds per frame, so for 250 frames it was about 41 minutes to generate the video. Stability matrix, comfyui, and won are actually very good at maximizing your video card and system memory without thrashing. I’ve had locally run textual LLMs drop performance by about 90% the second your gpu memory is full but somehow stability matrix / won / whatever was using both system and memory ram without stalling my GPU. I think it’s partially the way you need ram in autoregressive vs diffusion generation but I wouldn’t be surprised if there’s some clever engineering in that software too.

I still had a little bit of memory headroom to maybe try a 12.5-13.5 second video if I wanted to but that’s starting to get prohibitive in terms of processing time. I guess I could just set it when I go to bed.

I’ve just been experimenting with simple videos mostly bringing still images I took to life. The results are sometimes nice, sometimes AI horror. It’s fun to experiment with.

Edit: I saw the DMD2 LoRA that changes the whole way images are generated and suddenly you can output high quality images in like 1.5 seconds each, like magic. I was cranking out batches of 35 images at 1024x1024 in under a minute. Something to do with the CMT denoiser, it uses a different kind of technology. It’s kind of weird how it’s a LoRA rather than a seperate model entirely, but it seems like it’s just a flat out… make your images generate in 1/5 the time AND better (depends on the model) which seems like one of those win/wins that shouldn’t be possible

SenorBeef · April 29, 2026, 12:38am

Flow is a very cool video generation tool. You know what’s obnoxious, though? It could be a midjourney competitor, or at least midjourney like workflow. It’ll allow you to generate 4 images or videos at once. But… there’s almost no variation in those images. It’s not 4 takes in the same idea like midjourney is. It’s more like when you run “variation subtle” in midjourney - some details may change like the positions of people in a street scene or clothing in a portrait but you’ll end up with almost the exact same thing with minor tweaks. Seems like a wasted opportunity - if they put in some space for the model to generate a different take on each generation (let the first one be the canonical one that tries to be exact, let the other 3 be different takes on the idea) - I hardly even use the x4 workflow because it hardly gets you anything useful. Total wasted opportunity by google.

Jophiel · April 29, 2026, 1:11am

No argument there. I also usually just do 1x or 2x despite it not costing me anything to do 4x – it just feels like a waste of pixels.

SenorBeef · April 29, 2026, 11:32am

Random midjourney adventuring tonight

Jophiel · April 29, 2026, 9:27pm

Missed this last time but this is generally true. There’s something to be said about no content moderation – not even boobs & blood but dumb stuff like Flow regularly throwing out “might have a prominent figure” errors when the prompt has no one named. There’s also sometimes interesting tweaks to the settings you can make. Generally though, I just think it’s neat that I can do it without relying on some bajillion dollar mega techcorp.

Anyway, random nonsense:

Playing fast & loose with “true” there:

Scene from a recent Pathfinder game:

I once made a custom LoRA for 'Zine style stuff and wanted to test Flow. It did pretty well!

Harlow Monkey Experiment pinball table:

Not sure what’s going on in this book but I’m here for it:

LSLGuy · April 29, 2026, 9:41pm

Somehow today I got curious about the outcome of asking the various AIs to generate a small band of characters who would be plausible Pokémon but are not actual characters of that franchise.

I’m partly curious about the art, but also the “legalities” as perceived by the AIs. A specific Pokémon character, e.g. Pikachu, is surely copyrighted and legitimately protected and off limits to create trivial variations thereof.

But IMO the “Pokémon style” is not copyrightable, much less copyrighted. So ought to be fair game.

What eldritch horrors might flow from such a prompt?

Darren_Garrison · April 29, 2026, 10:07pm

Gemini did this

ChatGPT did this

Copilot refused.

Darren_Garrison · April 29, 2026, 10:16pm

Create several new creatures in the style of Pokemon. Create realistic photographs of them as if they were real animals, possibly more disturbing-looking in photorealism than in cartoons. Iphone 15 photo.

Gemini

ChatGPT

Copilot

(No descriptions given.)

LSLGuy · April 29, 2026, 10:47pm

Thank you!!

That last batch looks a bit too similar to the dino on Jurassic Park that ate the traitorous IT guy after he crashed his jeep. Not sure I’d like to play with those as a kid, and especially not shortly before bedtime.

Darren_Garrison · April 29, 2026, 11:35pm

A Dilophosaurus.

Those are all Gemini/Google.

I asked Copilot and it made an image and immediately afterwards told me that it couldn’t make it.

Topic		Replies	Views
Digital art creator algorithm website Cafe Society arts-crafts , ai	2367	38538	August 23, 2025
Funniest AI Legos Miscellaneous and Personal Stuff I Must Share ai	433	17924	March 30, 2025
Share your mental picture of your fellow doper(s) Miscellaneous and Personal Stuff I Must Share	182	30122	April 7, 2013
Midjourney image/video creation tool In My Humble Opinion	70	330	April 29, 2026
Misinterpreted avatars Miscellaneous and Personal Stuff I Must Share	109	4760	August 5, 2024

AI image generation is getting crazy good

Related topics