Hello Everyone,
I was wondering how Google filters nude images out of it’s image searches. You can choose your level of filtering: Strict, Moderate, No Filter (not sure if that is the exact phrasing on the choices). I have never once gotten a nude picture with filtering on, but turn it of and rule #34 takes over.
So, how do they do it? The pics aren’t labeled I am sure, so it must be software recognition I suppose. Amazing how quickly it can be done. What does the software look for, nipples? And how does it tell a topless man from a woman, or a woman wearing a skin colored tight top? Or how about a couple having sex, but you can’t see any naughty bits? It is really quite amazing that it can be done, would like to know how.
I honestly have no idea, but my WAG would be that there is usually enough meta data (text on the page, links to that page, etc) that point to the fact that the site is pornographic.
Either that, or an army of trained monkeys adapted to spot nipples.
Well, I just did a Google search for the term “no shirt”. With filtering removed I got mostly naked women. With it on I got almost all topless men. However there were a few women, but something was covering their breast, which makes the filter even more impressive.
However, for the first time ever the strict filter screwed up. The very first image shown was a painting of a topless woman, boobs and all. However upon closer inspection it was a not so attractive woman that quite frankly could have passed for a man. Still, found it interesting that this is the first time that has happened to me.
Here is the Google search SFW version of “no shirt” that I recieved. Be warned the first pic is not only NSFW, but probably not safe for the eyes either. (If it is someones mom/wife, my apologies in advance. I am sure that she is a lovely woman!:D)
Update: While checking my link to the Google search I played with the filter. Now, when I am on the page and change to the “no filter” option I get mostly men as well. Weird.
You can use machine learning to train computers on images just like you could with text.
Machine learning algorithms work like this for binary classification (ex. spam or not spam, porn or not porn):
Show the computer a lot of training examples, some with positive classification and some as negative. Tell it which ones are positive and which are negative.
From the training examples, it makes a guess and then learns from its mistakes if it guessed wrong. It figures out the similarities between the positive examples and the similarities between the negative examples. Keep doing this until it gets all the training examples right.
Then, unleash your algorithm on to the test set - in this case real world data. Monitor it and see how it does.
Combine this with a “report image” feature for real-time on-the-fly learning and you have a powerful algorithm that can automatically detect porn, just like a spam filter can detect spam.
Have you heard of Amazon Mechanical Turk? It’s basically a site where you can pay people to (among other things) classify datasets to prep them for machine learning. There’s a whole category for paying people to classify images, and a whole category under that for paying people to classify NSFW images. So if a project is open there you too could get paid (not very much) to classify porn!
While the filter does work pretty damn well, it breaks down a bit if your search terms are explicitly pornographic. Try a search for “nude girl fucking” (I swear I did this for science). With filtering set to “Moderate”, there are still several nude images on the first page and one which is clearly pornographic in nature. Interestingly though, setting filtering to “Strict” takes care of it - when I did this there were only two images on the first page suggestive of nudity. One was an art shot with a clearly nude model but all the alleged naughty bits carefully concealed, and one that does show female nipples, but it’s an image of a tribal African woman. And none of the images on the first few pages were pornographic.
Now I will need to run some additional tests to see if I can come up with search terms that are obscene enough to break the “Strict” filter.
OK, now this is funny. I just tried the (NSFW) search terms:
cocksucker fucked hard gangbang orgy
And with “Strict” filtering on, I was told that all of the words in my search were specifically excluded from the search by the filter, except for the word “hard”. The results? A bunch of images of computer hard drives.
Also interesting is that the “Moderate”-filtered results for this same search do not tell me any words were excluded, but the results are entirely non-pornographic. Much less pornographic/nudity-laden than the “nude girl fucking” query above.
You can also report images, if you have a filter for strict or moderate and what you consider an inappropriate image comes up, you can flag it for Google to consider
Some words like the “C” word don’t pull in anything on strict. But I found 3 topless women with the phrase -> no shirt
The word gay brings up two shirtless guys kissing and Justin Bieber
Yes, domain, links, text, and captions are all probably taken in to account, but I wouldn’t be surprised if they did image scanning too (at least to break ties or when the probability one way or the other isn’t great enough).
obbn, please remember the “two-click” rule for NSFW links (and I think a page full of shirtless men might be looked at askance in a lot of work places even without the first image.) This can be done just by putting a spoiler box around the link. No warning issued.
Colibri, thanks I wasn’t aware of the rule. Quick question, how do you get the link to show as a spoiler? I don’t see it in the tool bar of the text box?
It probably should be in the graphical editor, but you can type out the tags. Put the text between a (spoiler) and (/spoiler) tag, but replace the parentheses with brackets.