DALL-E 2 random text on graphics

Jasmine · March 20, 2023, 12:32pm

Well, that seems rather crude and crappy. It’s a specific function, so it seems to me that creating a “switch” to control it shouldn’t be that hard. On the other hand, you could hide my knowledge of this on the head of a pin, so I could be way, way off for reasons I don’t even understand. LOL

Darren_Garrison · March 20, 2023, 12:45pm

Yes, you have a very, very inaccurate idea of how these AIs work. It isn’t switches and settings, it is more like standing behind an artist’s shoulder and saying “draw me a car being driven by a Tyrannosaurus rex wearing a tophat”, and then have him try a dozen versions and edit the best one of those a dozen times based on your descriptions. Except the artist is an alien who is only familiar with Earth from seeing pictures. And you can only use words that were printed in the captions of the photos. Like, if you want to see a group of people with large eyes, you get that by yelling “Dan Witz! Margaret Keane!” because the alien has seen the works of Dan Witz (who paints crowds of people) and Margaret Keane (people with large eyes) and can somewhat copy their style. And you have to discover on your own what names get what result because there is no guide.

Now that I think about it, the alien is a Tamarian.

Jasmine · March 20, 2023, 1:02pm

Ouch! I was hoping for, “Gee, you’re not so bad, Jasmine! :)” LOL

My God, if it can do all of that, why can’t it do, “Draw me a car being driven by a Tyrannosaurus Rex wearing a top hat with no written text.” ?!

Darren_Garrison · March 20, 2023, 1:13pm

Because people don’t label images based on what they don’t have in them. You will find images labeled “car”, images labled “Tyrannosaurus rex”, images labled “top hat”, images labled “driving”. But you probably won’t find many images being described as “no text”.

Babale · March 20, 2023, 1:14pm

Because it doesn’t actually understand language. It’s an AI. When you type a sentence it creates a bunch of tokens about the words, their order, their relationship with one another, etc; and it relates this information to matching tokens it has that define images, relating the two sets of tokens using a complex and very opaque model.

The phrase “no text” simply doesn’t have a powerful enough association with a specific set of tokens to eliminate all text and anything that looks like text.

Babale · March 20, 2023, 1:16pm

If @Jasmine would like to aid the state of the art, they can download a few training datasets and label a few million images as having text or no text. Then we can train the next generation of AI on this enhanced dataset.

pulykamell · March 20, 2023, 1:17pm

Isn’t this already pretty much automatable to a reasonable degree of accuracy? My phone has simple text detection on it.

Jasmine · March 20, 2023, 1:17pm

I’ll (not “we”) get right on it. LOL

Mangetout · March 20, 2023, 2:08pm

Sure, something could be added onto the process to cull and retry if it detected text in the output, but what if you want no bananas in the image, or no gerbils, or no sand, or no water, etc.

It would, I suppose, be possible to train one of these algorithms to be sensitive to instructions about what should not be in the output, possibly by using adversarial methods, it’s just that this doesn’t seem to have been a priority for anyone

Darren_Garrison · March 20, 2023, 2:37pm

Actually, Stable Diffusion does successfully do that, at least in some cases. I use Mark Ryden in a lot of prompts, and Mark Ryden uses frames as integral parts of lots of his art. Stable Diffusion a large percent of the time puts frames around images with his name in the prompt. Putting a second prompt with “frames” at -1 works for removing frames.

Mangetout · March 20, 2023, 4:42pm

But that’s probably because the token ‘frame’ was a tag in enough of the training data for the algorithm to learn what a frame is.

Obviously the token ‘text’ or ‘writing’ must also explicitly appear in a lot of the training data when the theme of the image was ‘text’ (i.e. ‘A sign with fancy text’), but there will have been a much larger set of training images that contained writing, but were not tagged - ‘a can of beans’, ‘a stop sign’, ‘a movie poster for the film Citizen Kane’, ‘a tourist information kiosk’ etc. Most images containing text probably don’t have ‘text’ in the metadata, or indeed any consistent indicator like that.

Chronos · March 20, 2023, 9:30pm

[Moderating]

Nobody suggested that you were a “we”, and taking offense to an imagined slight is not productive for FQ. Knock it off.

Topic		Replies	Views
Another AI art generator Cafe Society ai	6	379	July 28, 2022
Digital art creator algorithm website Cafe Society arts-crafts , ai	2360	37179	June 8, 2025
How bad are these AI generated kanji? Factual Questions	2	547	February 17, 2023
Comic Book-Style Logo Generator - Where to find one? Factual Questions	4	9543	April 2, 2011
Variable Art/Data Software Question Factual Questions	8	879	December 27, 2007

DALL-E 2 random text on graphics

Related topics