How does Google filter out NSFW images in a GIS search?

Hello Everyone,
I was wondering how Google filters nude images out of it’s image searches. You can choose your level of filtering: Strict, Moderate, No Filter (not sure if that is the exact phrasing on the choices). I have never once gotten a nude picture with filtering on, but turn it of and rule #34 takes over.

So, how do they do it? The pics aren’t labeled I am sure, so it must be software recognition I suppose. Amazing how quickly it can be done. What does the software look for, nipples? And how does it tell a topless man from a woman, or a woman wearing a skin colored tight top? Or how about a couple having sex, but you can’t see any naughty bits? It is really quite amazing that it can be done, would like to know how.

I honestly have no idea, but my WAG would be that there is usually enough meta data (text on the page, links to that page, etc) that point to the fact that the site is pornographic.

Either that, or an army of trained monkeys adapted to spot nipples.

Do those things show up during a search with NSFW images filtered out?

Unless they do, then there’s no need to distinguish between them. They’re just all filtered out.

Well, I just did a Google search for the term “no shirt”. With filtering removed I got mostly naked women. With it on I got almost all topless men. However there were a few women, but something was covering their breast, which makes the filter even more impressive.

However, for the first time ever the strict filter screwed up. The very first image shown was a painting of a topless woman, boobs and all. However upon closer inspection it was a not so attractive woman that quite frankly could have passed for a man. Still, found it interesting that this is the first time that has happened to me.

Here is the Google search SFW version of “no shirt” that I recieved. Be warned the first pic is not only NSFW, but probably not safe for the eyes either. (If it is someones mom/wife, my apologies in advance. I am sure that she is a lovely woman!:D)

Update: While checking my link to the Google search I played with the filter. Now, when I am on the page and change to the “no filter” option I get mostly men as well. Weird.

You can use machine learning to train computers on images just like you could with text.

Machine learning algorithms work like this for binary classification (ex. spam or not spam, porn or not porn):

Show the computer a lot of training examples, some with positive classification and some as negative. Tell it which ones are positive and which are negative.

From the training examples, it makes a guess and then learns from its mistakes if it guessed wrong. It figures out the similarities between the positive examples and the similarities between the negative examples. Keep doing this until it gets all the training examples right.

Then, unleash your algorithm on to the test set - in this case real world data. Monitor it and see how it does.

Combine this with a “report image” feature for real-time on-the-fly learning and you have a powerful algorithm that can automatically detect porn, just like a spam filter can detect spam.

Someone at Google must still have a smile on their face from getting paid to test out that one.

Have you heard of Amazon Mechanical Turk? It’s basically a site where you can pay people to (among other things) classify datasets to prep them for machine learning. There’s a whole category for paying people to classify images, and a whole category under that for paying people to classify NSFW images. So if a project is open there you too could get paid (not very much) to classify porn!

I doubt it is doing very much image scanning. I bet it is mostly based on three factors:

(1) The domain. cnn.com = ok, hotnakedchicks.com = not ok

(2) Links. If the image/page is linked to by cnn.com that’s good. If it is linked to by hotnakedchicks.com, that’s bad.

(3) Text in the page. If the text says stuff like nuns visit a local orphanage, that’s good. If it’s naughty nuns have fun, that’s bad.

While the filter does work pretty damn well, it breaks down a bit if your search terms are explicitly pornographic. Try a search for “nude girl fucking” (I swear I did this for science). With filtering set to “Moderate”, there are still several nude images on the first page and one which is clearly pornographic in nature. Interestingly though, setting filtering to “Strict” takes care of it - when I did this there were only two images on the first page suggestive of nudity. One was an art shot with a clearly nude model but all the alleged naughty bits carefully concealed, and one that does show female nipples, but it’s an image of a tribal African woman. And none of the images on the first few pages were pornographic.

Now I will need to run some additional tests to see if I can come up with search terms that are obscene enough to break the “Strict” filter. :smiley:

Try doing a strict search on a porn star. I typed in, for example, “Michelle Wild,” and the first two image results I got had boobies.

OK, now this is funny. I just tried the (NSFW) search terms:

cocksucker fucked hard gangbang orgy

And with “Strict” filtering on, I was told that all of the words in my search were specifically excluded from the search by the filter, except for the word “hard”. The results? A bunch of images of computer hard drives. :smiley:

Also interesting is that the “Moderate”-filtered results for this same search do not tell me any words were excluded, but the results are entirely non-pornographic. Much less pornographic/nudity-laden than the “nude girl fucking” query above.

You can also report images, if you have a filter for strict or moderate and what you consider an inappropriate image comes up, you can flag it for Google to consider

Some words like the “C” word don’t pull in anything on strict. But I found 3 topless women with the phrase -> no shirt

The word gay brings up two shirtless guys kissing and Justin Bieber :slight_smile:

Yes, domain, links, text, and captions are all probably taken in to account, but I wouldn’t be surprised if they did image scanning too (at least to break ties or when the probability one way or the other isn’t great enough).

Moderator Note

obbn, please remember the “two-click” rule for NSFW links (and I think a page full of shirtless men might be looked at askance in a lot of work places even without the first image.) This can be done just by putting a spoiler box around the link. No warning issued.

Colibri
General Questions Moderator

Aahh…that’s-a good filter. It’s not-a no shirt, it’s-a no blouse.

They probably had a lot of wear.

Someone once asked the AI running the algorithm how it knows something is porn. It replied, “I couldn’t tell you but I know it when I see it.”

Colibri, thanks I wasn’t aware of the rule. Quick question, how do you get the link to show as a spoiler? I don’t see it in the tool bar of the text box?

It probably should be in the graphical editor, but you can type out the tags. Put the text between a (spoiler) and (/spoiler) tag, but replace the parentheses with brackets.