Sure; I would love to run that notebook myself while tweaking the generator settings, resolution, etc. but we’ll see.
Here are the “original” DALL-E versions btw— it is instructive to note the types of flaws:
(Tesla and Edison look more like they have a Harry Potter cover thing going on…)
Well, yes, of course there’s text associated with the images. But I mean, there’s nothing 3D in there, nor video. And without 3D models or video, how does the model know that elephants have knees homologous to human knees, so it can put an elephant in the same pose as a human would be at a tea party? How does one create a photorealistic 2D image of broccoli, without starting with a 3D model? At some level, to actually create images like that (as opposed to just collaging existing images, which I suspect is what it’s actually doing), you have to have a working model of the whole 3+1 dimensional world. Every human has such a model, but I don’t think that it’s possible to construct one just from labeled still images.
Nightcafe is, I think, in some ways more impressive, because it is creating new images. Its images are flawed, because it doesn’t have a good world-model, either, but it’s still more creative.
I mentioned this before, but I think the model is biased (perhaps deliberately) towards human images: Whenever it ends up with something vaguely like a human, it tries to evolve it in the direction of Generic Human, in addition to whatever other directions it’s evolving in. This serves it well, overall, because it makes it do a better job with actual human images, which a lot of people using it will want. But it’s not so good when the image isn’t actually human, or when it’s human but non-generic in some very distinctive way (like Mr. T’s mohawk). I think what you have there is a fight between the AI trying to make someone look more Melmacian and more Human at the same time, so Alf’s schnout is much shorter than it should be.
Meanwhile, I think that the way that Nightcafe processes prompts is that it takes both the whole prompt, and various sorts of substrings of it, does image searches on all of those strings, and tries to amalgamate all of them. That’s why it’s still able to make… something… out of my 24-character fully-random “password”: Even though the whole string produces no hits, shorter pieces of it do (though of course they’re all completely unrelated, so you get a picture with a bunch of unrelated elements). It’s also part of why @Darren_Garrison 's prompt of The Great Wave off Kanagawa by Hokusai worked so well: Searches for “The Great Wave”, “Kanagawa”, and “Hokusai” all return the same image, and even a search for just “Wave” has it as the fourth hit, so the AI has no ambiguity about exactly what it’s targeting (the other part of the success being that that image is all about the textures, which is something the AI is good at, and it’s very fractal, so mismatches of scale don’t matter).
Tonight I saw a photo on Facebook of something someone was a small toy train until he looked closer and saw that it was a caterpillar. I thought the image had great potential, so I roteted and cropped it and adjust the contrast a smidge to make it hopefully work better. I tried it with “fantasy train”, and while it isn’t exactly the was expecting, I think it is one of “my” best creations.
Click on “duplicate”. There are a lot more good images to be had.
People are getting undoubtedly impressive and creative results using “pure” diffusion (possibly broken down into multiple stages though), and variations (some are making their own DALL-E of course), but I believe the more impressive ones tend to tweak their own notebooks rather than run everything through Nightcafe.
Yeah, there are multiple layers, here. There’s NightCafe or other AIs as artists in their own right, where people are just asking it “Give me a picture of _____”, and then looking at whatever it is they get, and then there are also human artists who are using the AIs as tools to create their own art, with elaborate prompts, seed images, and iterations using more of the same. Personally, I’m more interested in the former, because humans already have loads of other tools for creating art, but machines creating art is something new. And how they do it offers clues, I think, to how we do it, which is always interesting.
I land here as well. I might add some additional prompts if there’s something I want to see but generally I just want to know what the AI will create instead of trying to get it aligned to what I wanted to see. If I just really wanted a “good” picture of a dog smoking a cigar on a unicycle, I could get one some other way.
I can see the value in trying to use it as an actual tool though. Both to test the AI and to generate usable images. I mentioned before that someone I know is using an AI generated image as the cover for his own sci-fi book (which I guess is fitting) and I’ve seen a few atmospheric images or images in the “Parchment sketch of a…” vein that would be useful props in a D&D game.
I mentioned the site and its ukiyo-e settings (and gave a few examples) in a Ukiyo-e Facebook group I follow a couple of days ago. It has passed 120 likes/reactions. Probably sending new user traffic their way.
The limitations of the system can be disappointing. I have a nice photo of a seashell that I want to see as a colorful spiral tower, but the downsizing/blurring throws all that away. The images are okay, but so much blander than I hoped for.
“Ancient tower on a cliff by the sea oil painting by James Gurney” - made with @NightCafeStudio