That depends on the type of sites the images come from. Tags won’t mention things taken for granted at that site. If images are taken from a porn site, the tags probably aren’t going to say “porn”, just like an image from a cat site probably won’t have the tag “cat”. (Flip them around, and you probably will get tags.)
That cannot be right, because the whole purpose of these databases is to train AI models, and the AI models obviously don’t know what a cat site or porn site are.
Most places are using some version of the Stable Diffusion model, because training billions of images is extremely intensive and outside the resources they have available. I believe the SD 2.0 model did attempt to purge a lot of NSFW content but it’s widely considered all around inferior to the 1.5 model – not just “Won’t draw nudes” but doesn’t recognize celebrity names, artist names, has trouble with non-photorealistic styles, etc. It also (supposedly) doesn’t align well with the CLIP terms and relies very heavily on negative prompting to give you the image you want (versus just asking for what you want, you need to be very specific about what you don’t want).
As a result, SD 2.0 feels to many like a worse version of SD 1.5 mainly created to appease the businesses using their models. But many places still use SD 1.5 with the filters because they want easily generated images of Trump & Biden on a picnic in the style of Norman Rockwell, not laborious negative prompt-crafting.
As an aside, Midjourney has their own model and supposedly did remove a lot of NSFW content though I think they primarily worked on removing underage content. For the rest of it, they rely on filtering at both the prompt generation and output sides. When v5 first launched, it was very easy to get nudes (often whether you wanted them or not) so I assume the bulk of that content is still in the model. Possibly because they don’t want to have to retrain it all if they decide to add some paid NSFW private membership tier but that’s just a guess.
Once upon a time, I mentioned that a few friends and I were trying to catalog the artists in Midjourney’s model by giving the prompt “By [artist name]” and looking for reproducible results. The result of all that so far has been Elkpunk.com which is up to 1,864 artists and growing regularly.
Although I have permission to share the link, it’s not really a “public” site. By which I mean, it’s a product of four or five people making a site that serves their needs and not made to be useful to anyone else. So some artist tags might seem ‘wrong’ or incomplete but it’s what’s working for us. Plus you have three different people adding tags all based on their personal opinions and some casual Discord chatter and none with any formal training in this sort of thing. Also some images were originally tagged in v4 and then v5 came out, we re-ran all the images but v5’s treatment of some artists has changed significantly and left us with some confusing tags we haven’t caught/cleaned yet.
However, for just looking at the breadth of artists covered by MJ and the style of images it links to them, it should still be pretty cool for most people. Also had some application with other models like Stable Diffusion where you can at least see what happens when you use that artist.
The databases are extracted from Common Crawl data. The image tag is the image tag written for the original site. It has to be because nobody has the time and resources to go through billions of the tags and edit them for consistency. The tags work at all for AI because things average out over the massive number of images. They would work much better if someone did go through a billion of them and give them all very long, detailed and consistent tags.
The lack of consistent tags is, IME, a big part of the reason for bias in the AIs. They probably have millions of photographs of Asian women, for example, but people putting up their vacation photos aren’t going to tag themselves as “asian”, while porn sites will because of fetishes. And you get results like this:
If evey single photo of one or more people in the databases had tags describing age, sex, race, clothing, position (full profile, 3/4 profile, etc) and more the models trained from them would have much more to work with and be much more powerful.
From my informal testing, SDXL seems better able to recognize artist styles than SD 2.0/2.1 but is still worse than 1.5. And SDXL seems worse at blending concepts and styles (like the human/animal hybrids that resulted in my aliens) than any of them.
BTW, I’ve been keeping a casual eye on the numbers for the reddit Stable Diffusion group, and it has gained 100,000 members in around six weeks. I don’t remember the exact first date I noticed, but it has gone from 170,000 to 270,000.
There was a GQ thread about AI which got digressed into rendering garbage trucks shaped like elephants. I won’t bump that one for this but I was playing with “George Barris” (designer of the 60s Batmobile and other awesome custom cars) as a prompt and tried “A Garbage Truck Shaped Like An Elephant by George Barris” in Midjourney…
I don’t think, from that article, that the author of that piece was using an “Asian” tag to describe herself: She said that her colleagues had all made recognizable pictures of themselves, and she initially expected the same for herself, and that’s very hard to do with just text descriptions. I think she was just starting with a photograph of herself, and asking the AI to depict her in various styles and contexts. So the millions of “normal” pictures of Asian people should have had the same weight as porn pictures, at least on a per-picture basis. The large number of sexualized outputs could just mean that there’s a very large number of sexualized pictures of women, especially Asian women, on the Internet. Better tagging won’t cure that problem, but better curating of datasets could.
I suspect that porn sites are even more overrepresented in the data than on the Internet as a whole, if they’re really just only using the tags on the sites themselves, and not generating tags based on the site or other context on the page (i.e., auto-appending a “porn” tag to all images from porn sites and a “cat” tag to all images from sites about cats). Because most pictures on most websites aren’t tagged at all. Tags for pictures only become relevant when you’re creating a site with a lot of pictures, for which someone might want to search for particular sorts of pictures. So most porn sites will have tags for their pictures, but most pictures of people on vacations, for instance, won’t.
Awsome. The Indian-styling makes me think of a Juggernaut (bitch).
From the description of the project that provides the images:
be aware that this large-scale dataset is non-curated.
…
The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021.
And I think you might be wrong about most sites not using image tags. They are important for search engine optimization and for providing data for screen-reading apps for the blind.
Which are things that companies that make their business on large sets of images care about. But when people are just posting their vacation pictures online somewhere, they usually don’t care about details like that.
Another in my series of imagining Bob Dylan’s lyrics.
The Commander-in-chief answers him while chasing a fly
Saying, “Death to all those who have whimper and cry”
And, dropping a barbell, he points to the sky
Saying, “The sun’s not yellow, it’s chicken”
Midjourney version 5
I like the 4-legged dove.
I don’t know if @Chronos still needs a leopard riding a bicycle, but I accidentally generated the perfect image yesterday.
(That was supposed to be a cat rat riding a frog hog.)
The bike-to-school event was almost two weeks ago. But yeah, that picture is definitely… um. It’s definitely a picture. Of something.
Today Playground added several powerful SD mods to their list. Worth checking out if you have lots of time to kill.
For those on Playground AI, they just added nine new filters to play with. It’s hard to tell exactly what they all do, but in my limited effing around with them, they all seem to lean toward realistic 3D images with lots of detail.
Ninja’d, but I provide screenshot evidence.
I was just wishing Playground would add some of the mods found at Dreamlike Art (with its very limited number of free daily images) when they did this. It isn’t Realism Engine, but it is pretty good.