ChatGPT-4 controlling Dall-E 3 can do some pretty nice comic book cover art if you don’t care about consistent characters, or even specific elements. Trying to nail down the elements that you liked tends to get trash.
I still don’t understand why they don’t use their nudity filters on the training data, instead of on the output. If they can detect nudity in the output, then they can detect it in the training data, and that’s even before you look at the tags in the training data which will also often give it away.
Those filters seem to have waaaay too many false positives.
All the more reason to use them on input rather than on output. If your filters are so sensitive that they accidentally catch 10% of clean images, then you still have 90% of your clean database to work with, which should be plenty. But if 10% of the time when a user generates an image, the machine says “No, no, naughty human!” and refuses to show you your image, people get annoyed.
Preview isn’t working for some reason so here’s the image (“Retro Future Holiday Elf”):
I’ve just discovered an interesting issue with the Meta AI. One of the example images given for Midjourney v6 comes from this prompt:
35mm film still, two-shot of a 50 year old black man with a grey beard wearing a brown jacket and red scarf standing next to a 20 year old white woman wearing a navy blue and cream houndstooth coat and black knit beanie. They are walking down the middle of the street at midnight, illuminated by the soft orange glow of the street lights --ar 7:5 --style raw --v 6.0
I plugged that into meta (minus the Midjourney specific stuff at the end) and tried to save the examples. But the Meta AI names the files with the full prompt. The file names are so long that the files can’t be downloaded on Firefox for Android.
The article with the sample:
I’m impressed with the advancement in hyper-realism from Midjourney v5.2 to v6. It’s certainly now at a point where the output is ready for professional-level online posting and even print, for some/many projects. Other generative AI image engines I use (including Stable Diffusion, Nightcafe, Bing-Dalle-3…) are likewise improving at breakneck speed. Competition spurs fast advancement, and the competition is fierce in this field.
But, if you need to elevate generated images from hyper-real to real, you can further refine them in “post-production” using other AI apps and filters. I use Adobe’s Firefly (within Photoshop), but there are others even more powerful.
This video by PIXimperfect (hosted by a perfectionist) demonstrates techniques for producing realism using 3rd party filters, though I expect these will become standard in future generative AI app versions. These are interesting times, and one awaits upgrades with bated breath.
A couple of days ago I was scrutinizing hands on John McClain renders (since I discovered DE3 does him well). Remember only a few months ago when AI hands were a big joke? Sometimes now they still come out mangled, but a large percentage of the time now they are pretty near perfect.
(These also show what a good job DE3 could do with celebrity images in general if it wasn’t artificially hobbled by language filters.)
They did this with Stable Diffusion 2.0 where much of the nudity was culled from the set before training. It was a very unpopular model, both due to the baked in censorship and due to the difficulty in getting good overall results. I couldn’t say exactly how much the filtering affected the overall model but even people making cat memes and Elvis as Pope pictures seemed to widely prefer SD1.5. I don’t think the filters are anywhere near sophisticated enough to distinguish CSAM from standard adult material and eliminating all adult material means removing a lot of images that contain data beyond “boobs”.
All that said, Stability AI is promising more curated versions of Stable Diffusion in the future:
Stability AI on Wednesday said it only hosts filtered versions of Stable Diffusion and that “since taking over the exclusive development of Stable Diffusion, Stability AI has taken proactive steps to mitigate the risk of misuse.”
“Those filters remove unsafe content from reaching the models,” the company said in a prepared statement. “By removing that content before it ever reaches the model, we can help to prevent the model from generating unsafe content.”
There is also the fact that people can train their own data. This shouldn’t be used as a reason to shrug at CSAM material in existing data/models but just that the issue of AI being used to generate CSAM likely isn’t going away.
Would a human artist skip nude figures in his or her training? Of course not. Also, if Picasso and Michelangelo had some nudes in their output, it is conceivable you may want them, too. So, whatever you want to filter out, it is not “nudity” or “boobs”.
Yeah exactly. Filtering tends to be a very blunt instrument. If your goal is a commercial website and you just want zero nudity of any type, maybe you filter the training data and hope the competition isn’t rendering better results. For any more nuanced application, you’re probably better with a system of filtering results and system moderation.
As you point out, removing all nudity from the training models will also lower the value of training a lot of artist styles. If you include them, you have The Birth of Venus teaching the AI what naked people look like. No good answers.
Working a complex prompt in Bing/DE3. I don’t care if it is 80% failures, I’m still impressed with the ones it gets mostly right.
The original goal was for Scully’s sweater to be Santa’s sleigh and “weather balloon” but it always wanted to draw a hot air balloon. So I had to go with “weather ballon” and not a sleigh or a sleigh and not “weather balloon”.
I was saddened today to learn today that http://www.catbird.ai has been shut down. That was a great site for AI art. There were 50 or 60 models to choose from and you could make amazing stuff. Realistic models, Anime models, all kinds of stuff. I’d signed up for the Pro account ($25/mo) because it was just that good and fun.
(Also did NSFW, which was also fun sometimes, but I did other things like you guys show above. Sadness.)
Goodbye, Catbird! I know of nothing online this flexible.
Were you able to attempt fixing it with inpainting? DALL-E had that last time I messed with it.
No inpainting on the Bing version of DE3. I could have done inpainting of the images in Stable Diffusion, but I just went with varying the request. Having Rudolph as a weather balloon and Santa’s sleigh as swamp gas fit the idea I wanted anyway.
I’ve been kicking the tires more on Meta’s AI. It does some pretty impressive faces. I saw this simple prompt somewhere
Winter trekking girl selfie in dark woods, grainy, motion blur.
And plugged it in. This was my first batch of results. Meta beats both SDXL and Dall E 3 on this.
I was able to movie the camera back a bit by using selfie stick. Also played around with the setting. Here it is the apocalypse instead of winter in the woods.
And that led to further religious thoughts and I wondered if I could get selfies at the crucifiction. That wasn’t censored, but took lots of tries to get something good. Ones that I liked, I liked enough to inpaint out the stupid logo and outpaint to a wider ratio.
And here is the original prompt as handled by
SD 1.5
SDXL
DE3
Midjourney picked up the motion blur, which mainly made me think that “motion blur” was a bad prompt idea. Best skin, in my opinion. Be better without the requested motion blur distortions.
SD results are always a little frustrating to see as a comparison because of course it’ll suffer if you try to give it short natural language prompts under the basic model. That’s not how its prompting language works. SD can give great results when you get into the buttons and levers but you need to work for it.
You may prefer using “bokeh effect” instead of “motion blur” as a prompt: