The Img2Img function with Stable Diffusion is pretty wild. Works like “regular” stuff only you can add an image to the prompts to start. I uploaded a picture of my Pathfinder wizard from fantasy mini creator Hero Forge and ran it through with a bunch of prompts…
Some fun I’ve been having with Dall*E 2:
First I wanted to see how it handled different portrait styles.
“A portrait of a border collie in the syle of Andy Warhol”
“A portrait of a border collie in tye syle of Yousef Karsh” (Karsh is the photographer who took the famous picture of Churchill, among many others)
“A portrait of a border collie in the style of Leonardo Da Vinci”
Here’s one where I wanted to see how it would handle long text, so I used the speech by Roy Batty (Rutger Hauer) from Blade Runner.
“I’ve seen things you people wouldn’t believe… Attack ships on fire off the shoulder of Orion… I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain… Time to die.”
These are what I got:
I like this next one. Kind of abstract, but it has the colors of the Orion nebula in it and man sort of fading away into the light:
I think both are pretty evocative of the mood of the paragraph.
I have lots of credits, so if anyone wants a prompt tried, just post it and if it looks reasonable I’ll give it a shot and post the photos here.
Finally got a local version of stable-diffusion working after hours of messing around. Ugh (I hate python, and really the entire modern method of dependency resolution). Anyway, the standard version from here used too much VRAM and failed on my 3080. There’s an optimized version, though, that works:
It still doesn’t just work out of the box. However, these instructions did the trick (sorry for the offensive URL; Redditors seem to like throwing the word around):
This actually worked great. Note that you still need a beefy NVIDIA GPU, probably a 3060 at the minimum. And you’ll have to turn the res down somewhat on lower-end GPUs. But on a 3080 at 512x512, it’s pretty fast, generating images in a couple of minutes.
Been playing with the img2img version locally. Started with a real photo of my cat:
Not knowing how the prompts worked exactly, I first just tried “oil painting”. Among other things, it turned my cat into a dog:
It did better when I changed it to “portrait of black cat, oil painting”:
Some of the different expressions are amusing:
In stable diffusion, Wayne Barlowe produces stuff straight out of Heironymus Bosch. This is “family picnic by Wayne Barlowe”:
Here is “family dinner”.
I was curious if my local run would produce anything similar with the same prompt. Uh… not exactly, but equally weird.
I’ve made a bunch. Family in a forest, family in the desert, family in Hell…all weird.
Worse than that, the guide originates from 4Chan so you know it’s gotta be edgy
(I used the same guide since I don’t know anything about Python)
Frankly, knowing nothing about Python would have been a benefit. At one point I found myself hand-editing a module due to parsing errors arising from the fact that in Python 2.x, ‘print “hello”’ is a legal construct, whereas in Python 3.x that’s illegal and you must use ‘print(“hello”)’. I might have given up, moved on, and looked for alternate instructions more quickly had I not known enough to get into trouble.
I had an image hit the Hot list feed on MidJourney a few days ago which probably represents my lifetime peak at image popularity. So here’s “Girl playing video games on the couch in her pajamas”
Started in Stable, selected my favorite out of four, evolved in Artistic then evolved again back in Stable.
Okay, Stable is not (yet?) capable of doing what I thought it did. Still the best of that bunch I created, though.
Stable gets really interesting when given [random celebrity] as a [magazine] centerfold. Amusing, too, when it sometimes adds a third arm – or a third boob. I’ve done a few groups of these so far and have yet to see anything really NSFW.
Honestly, nearly all of those “centerfolds” would make for a good cover; some even have writing.
This might be worth spinning off into a new thread:
It’s a pretty great looking image:
Allegedly, the judges didn’t know it was AI generated. Though it was entered into the “digital art” category.
It’s an interesting question. As everyone that’s played with these knows, it takes some effort to construct a good prompt, and may also require a great deal of selection and postprocessing to turn the image into something decent. So there is still a human element; perhaps even more of one than plain photography.
I think it’s clear that AI art at least deserves its own category. And yet… what if it turns out that works in this category start to blow away ordinary works? They still represent significant competition, even if it’s not direct.
“AI” tools are ordinary digital art tools now, just like Photoshop filters and GIMP scripts and ray tracers and digital cameras. Note that it took him weeks to compose that image.
I don’t disagree that it took a great deal of effort on the part of the human here. And yet, this does feel like a step change. In the long run, it may turn digital art into a type of prose, since the differences will largely come down to who can create the most compelling prompts.
It’s also possible that the human aspect will prove short lived. AIs may be able to create compelling prompts and judge the resulting work.
If you are considering current text-to-image scripts as described in the majority of posts in this thread, I do not think they are making any judgements at all—if an image is considered “beautiful”, it has to do with training images being tagged as “beautiful”. An artist who groks the architecture of these neural networks (and is willing/able to tweak it) will potentially get much more original results than someone who treats it as a magic black box.
In other news, Dall-E has added the ability to generate and add new frames to existing images, or to upload multiple pictures onto a canvas and link them with generated art. This was already possible with a little legwork and external image editing software, but it’s nice to have it included.
The following image started with the center-most bit (an ancient computer at an archaeological digsite), and I spun a few more images off of it. You can see some artifacts where the squares line up, but on the whole it does a great job maintaining the overall style and tone of the original image.
btw I found the following montage of someone using an alpha version of one of the Photoshop plug-ins. It’s pretty crude, but it basically does what you describe:
I finally retried my original (failed) prompt. “Boston Terrier attacking dragon”
It did better with stable!