Digital art creator algorithm website

Okay, trying it in Chrome (instead of Firefox) gives the same problem, bit doing it from the app works.

Strange, I did it in Chrome on Windows and it gave me the direct url to the image.

Colorado Horror Story

Prompts: “overlook hotel | the shining” & “the stanley hotel | stephen king”

I maintain that, in order to produce that elephant tea party image, DALL-E must have had an image in its training library featuring an elephant in that exact pose. I maintain that because I do not believe it is possible for an artist, meat or silicon, to put an elephant in that pose without a very detailed model of what an elephant is and how it works, and I further do not believe that it is possible to create such a model using only 2-dimensional still images. The information simply isn’t there.

Likewise, I maintain that the training library must have contained pictures of leopards in the dominoes poses, and photographs of broccoli matching each of the frames in the broccoli vs. apple fight.

In cases where the AI didn’t have appropriate poses available that were so apropos to the requested subject matter, the results wouldn’t be nearly as good. But we don’t see that, because only the good pictures are being published.

Dystopian Statue of Liberty

Prompts: "Iron Maiden Statue of Liberty" & “Charles Addams drawing”

Now I feel super patriotic. :+1: :us:

Awesome.

I like how the hand throwing devils horns pops out of the fog like that.

I stumbled across this amazing image today (the complexity of the prompts makes me feel like a rank beginner). Replacing the “wolf face” mention with your own idea, plus removing the photo and adding your own if you want does have an effect, though I think you have to be pretty choosy for the subject to work well. You can turn off the extra accuracy and medium image to go from 4 credits to 1.

Some of that stuff can surely be pared away. For instance, the first prompt seems to be just the title. I did find that you could get a base origami effect by just adding “origami” to your prompt.

um, Putin riding a nuclear warhead.

I guess all available paintings of Putin are naked Putin.

I’ve also seen people using negative weight prompts. I guess to try to get rid of watermarks and stuff. Don’t know how to do it or how effective it is.

A number of those prompts feel contradictory as well: 16k Res/8k Res, Hyperrealism/Cel-shaded/Concept Art, etc.

They will not be releasing the training data set, so all we know is

The model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as YFCC100M. A large portion of the data comes from our crawling of the internet.

But it’s probably safe to assume that if there are pictures of elephants/leopards/tea parties on the 'net, which there are, they made it in there.

It’s not trained on “only 2-dimensional still images”, though. CLIP is trained on (image, text) pairs. That is the part of the model that learns to link text to a visual representation (which objects are present, the aesthetic style, colors and materials, and whatever else is described by the caption).

A GLIDE is trained to take a CLIP encoding as input, and generate a new image that maintains the “salient features” of the original. (Not literally reproduce the exact image, though.) Technically, the final image generation is a two-stage process: the text encoding is transformed into an image encoding consistent with it, and that is then "un-CLIP"ped. The “unCLIP” model contains the information CLIP discards— how the elephant is “put together”, in your words.
[If you click on that link, there is a good example of variations on an input image

(in the centre) I am not convinced that cutting and pasting are the right words to describe it, though you are surely right that the “features” are all presumably present in the training library, otherwise how could it have learned them?]

If you skim though some of the papers, there are absolutely some failures there, especially when the text prompt did not match anything the model had seen, (e.g. there was no problem depicting “a teddy bear”, but it [not DALL-E but some older research model] did less well with “a bicycle with tracks instead of wheels”); who would expect otherwise?

We could find out a hell of a lot more about how it works in practice by playing with it. I am apparently now on some waiting list, but since I do not plan to give them money or develop an app with them, realistically, I am not holding my breath.

I was feeling a little silly:

The Old Men of the Mountain

Starting prompts “Napoleon Blownapart” & “Drawing by Charles Addams” gave the hint of a face (the one on the right) after one evolution so I added “The Face of Boe”.

A Man-of-War Firing a Broadside

That might be a man of war, but it’s not a Man-of-War. Still, I thought it was a pretty good painting of a soldier under fire on a beach, lighting off his cannon in return.

1984 Louisiana World Exposition

Second prompt: “New Orleans Waterfront”

I don’t remember them having fireworks…

FWIW, the AI seems to understand style terms like 3/4 profile portrait, full-length portrait, and half-length portrait.

Understanding Luke Skywalker? So much.

I submitted this photo of Ephant Mon with the prompt Ephant Mon. What I got was a cartoon cat glaring out of the face-hole of some sort of over-sized suit and his Portuguese Man o’ War sidekick. I think that they are super heros. Or possibly super villians. Either way, I want them to have their own TV series.

Catlock and Bloop, a new original animated series on Hulu, streaming this Summer. Don’t miss their exciting and psychedelic adventures through through time and dreams!

To summarize, the AI knows what leopards, or Rick and Morty, look like in various poses because it has gone through every picture it could find on the Internet. It does not know how to depict them via some understanding of three dimensional geometry, perspective, detailed anatomy and the interplay of muscles, because that is not how the model works. All it knows about are visual features of 2-D images.

(But that is enough to put together a teddy bear riding a skateboard in Times Square, Lego Arnold Schwarzenegger, A Man Soldering a PCB by Rembrandt, Winnie the Pooh as The Thinker, Tesla and Edison battling with lightsabers, etc. because it understands enough of what the words mean and knows how to compose a coherent image. No, it is far from flawless, but you always get… something.)

Think of it as a tool. If you are a digital artist and want to create a masterpiece this way, you are going to have to do a non-zero amount of work, including understanding adjusting the parameters, selecting specific variations, knowing how to work with images and text prompts together rather than just one or the other, masking parts of the image to have the AI work only on certain details, and so on.