Just how much of the internet consists of porn anyway?

labmonkey · January 17, 2004, 7:03pm

I’ve heard offhand jokes, 98% porn, 2% crap…anyone know what percent of sites actually are porn? You can’t google anything with the word porn in it to do a search, because you get porn, or anti-porn crusaders. :dubious: However, it seems no matter what I enter, I get at least a few porn hits. The reason I ask is I’m trying to find a good site about Japan(hopefully with a messsage board not composed entirely of Asian freaks) and take a WAG what comes up when you search anything with “Japan” in it. Not that i have anything against porn mind you, just a thought.

gluteus_maximus · January 17, 2004, 9:07pm

Well, just as much as you could possibly want… what’s your point?
labmonkey, you do know, don’t you, that if you go to Preferences in your browser’s menu, you can set your browser to ignore porn sites?

What kind of sites about Japan are you looking for?

If these aren’t enough, be specific, and I’ll see what I can link you to.

Squink · January 17, 2004, 10:03pm

Googling on “and” in the text of web pages yields 3,520,000,000 hits.
Googling on “porn” in the text of web pages yields 94,700,000 hits.
Googling on “and” and “porn” in the text of web pages yields 94,700,000 hits.

So every web page that contains the word porn also contains the word and.
The pages that contain the word porn make up 2.7 percent of all the pages that contain the word and.

micco · January 17, 2004, 11:39pm

I wouldn’t know, but I’ve been told that the vast majority of porn is on pay sites and/or sites that require an explicit age agreement click-through. Google won’t index those sites, so the raw count statistics are off. On the other hand, there is an enormous amount of non-porn material hosted the same way. I don’t think this question can be answered by spiders.

labmonkey · January 18, 2004, 12:03am

Sorry for the schizophrenic post, looking at it now its difficult to say whether this is a factual question, or a very mild rant born out of frustration.gluteus maximus I’m looking specifically for a message board for international couples like the one my wife frequents, you know where people discuss the good, the bad and the ugly.(BTW I can’t use her’s, I dont read JP). What I’ve found so far has been quite lacking or maybe I’m just ruined for all other boards by SDMB. Seriously.

Squink · January 18, 2004, 1:27am

Sure, Google doesn’t probe deep content, but nothing else does either. The OP’s question related directly to google, so the data I provided related to the percentage of porn in readily available content.
If you try to go beyond that, you get into trouble defining exactly how private a database can be, and how tenuously it can be linked into the web and still count as a part of the internet. I figure it’s better to try to answer the question, than figure out why the question can’t be answered.

Headcoat · January 18, 2004, 3:15am

Instead of gauging percentage with web pages, its more accurate to measure total bytes of information that are pornography vs all other data. Pictures, movies, audio and erotica text existing on web servers, ftp’s, shared data on peer to peer programs, usenet, etc. Damn, there must be a lot of porn out there.

I’d say porn was the single biggest chunk of the internet pie in total bytes back in the ancient 90’s, but I’d wager that pirated (non porn) music and movies are perhaps bigger now.

I always wonder why the porn industry isn’t as vocal as the RIAA and the MPAA about combating piracy. Its gotta be hurting them far more than the big guys.

gluteus_maximus · January 18, 2004, 5:52am

Yeah, well, you have a good point there. I’ve looked at some of the English forums based over here, and there’s a lot of Kurdt Kobainishness. Whine, whine, whine.

Don’t give up yet, though. There is one association for non-Japanese women married to Japanese men,
AFWJ, which might be able to point you toward something for husbands.

More links

If I see anything else, I’ll post it here.

You can also set Google to ignore porn sites.

abby · January 18, 2004, 7:33am

Depends if you mean actual domain names, or terrabytes of content, or discrete pages, or what. Personally, I’d think adult sites would win on all these counts, but I have never seen any research on it. I’d like to know.

I think it’s still one of the few industries that is making even vaguely legitimate money directly from the internet (as opposed to the scams like “we’ll list your business in 20,000 search engines”, viagra, etc).

I don’t think is true - I run a porn site, and Google indexes many of my pages. Got a cite?

Yah, pirating does hurt. Most porn companies pursue it moderately agressively - we know that we’re not going to get any support from the media as we’re all so obviously evil, unlike the RIAA, which is pure as new-driven snow - but they use their own resources.

DMCA notices (Digital Millenium Copyright Act notice, a way of telling someone they look like they are breaching trademark or copyright, and listing the details of the breach) are easy to make and send to offenders and the hosting company (I send a few a week).

Most hosting companies know they are liable for what their clients put on their sites, and don’t waste any time issuing a cease and desist notice to the offender. Indeed, most offenders know it’s not worth the hassle either - they’ll just go steal someone elses’ content, and wait for them to notice.

Headcoat, good point on the pirated movies and music - I did not think of them. Seems to be a massive “business” with those P2P people. It’s interesting that no money changes hands - kinda like open source code, music and videos are released for the “greater good”, and no one pays to access the content. I guess the only people that are making money out of this are ISP’s and hosting companies that charge per Gb of traffic transferred.

abby

Jake4 · January 18, 2004, 5:21pm

That’s interesting, because as of today, Google is currently “Searching 3,307,998,701 web pages.”

Squink · January 18, 2004, 7:56pm

So 106.4% of the indexed web pages have the word ‘and’ on them?
That’s impossible. No word can appear on more than 100% of the pages.
I’ll bet that the number in the notice is a few months out of date.

micco · January 19, 2004, 3:24pm

I respectfully disagree. Your estimates are useful as long as they are only part of the answer, but the statistics are wrong and it is very useful to understand why.

micco · January 19, 2004, 3:27pm

I don’t have a cite, but I have some experience. I don’t run any porn sites, but I run or have developed quite a few content sites that are either restricted access (subscriber or authorized user only) or simply block spiders. In most cases, dynamic sites block spiders to prevent unnecessary load on the server, but in many cases they’re protecting their content too. I know one person in the porn industry who used to work with me, and he’s instituted the same kinds of access controls on his sites. A site can block spiders in a number of ways including (1) requiring a login, (2) requiring a license/age agreement click-through, (3) using server-level redirects to block spiders. I know of porn sites and non-porn sites which use all three.

I have no way of estimating how much content is blocked, either porn or non-porn. I’m not surprised to hear you don’t block indexing on your site, but I know of many others which do. I know of some enormous databases (e.g. genetic info) which are fully browsable by any person without a login but which block all known spiders. There is simply no way to estimate the ratio of Googled content to non-Googled content.

Squink · January 19, 2004, 4:41pm

Why no, the statistics aren’t wrong, they give a perfectly good estimate of what percentage of google-searchable pages relate to porn, and page counts are one of several perfectly good metrics for gauging the relative proportion of the internet devoted to different types of content. You’ll get no argument from me that page counts present only a partial description of content, but that’s not the same as page counts being wrong. If you prefer a different metric, use it, and we can argue about whether online bank records, part codes, or student grade lists constitue a part of the internet. I rather doubt that the OP was in search of a pedantic discussion of the relative weights that should be applied to text, images, mp3’s and the like in an assessment of what proportion of the internet they take up, but if that’s all you’ll accept, go for it.

micco · January 19, 2004, 5:19pm

Squink:

Why no, the statistics aren’t wrong, they give a perfectly good estimate of what percentage of google-searchable pages relate to porn, and page counts are one of several perfectly good metrics for gauging the relative proportion of the internet devoted to different types of content. You’ll get no argument from me that page counts present only a partial description of content, but that’s not the same as page counts being wrong. If you prefer a different metric, use it, and we can argue about whether online bank records, part codes, or student grade lists constitue a part of the internet. I rather doubt that the OP was in search of a pedantic discussion of the relative weights that should be applied to text, images, mp3’s and the like in an assessment of what proportion of the internet they take up, but if that’s all you’ll accept, go for it.

Easy there Sqink. It wasn’t a personal attack. The OP mentions Google, but his question regards the content of the Internet. If you want to discuss the Internet, Google page counts are, in fact, wrong. In addition, it would require massive unjustifiable assumptions to say that they are even proportional to the “right” answer. The Google counts are a useful data point, but it’s important that they be taken in context and the OP is better served by an understanding of why those counts are wrong than by simply citing counts.

Squink · January 19, 2004, 7:24pm

Have you got a cite to back up that opininion?

micco · January 19, 2004, 8:22pm

Sorry, all I have is common sense and a basic understanding of the infrastructure. The sky is blue, water is wet, and there are huge sections of the Internet Google doesn’t index.

JRR · January 19, 2004, 10:42pm

I just did my own search for “and,” which came back with 3,600,000,000 results, or 80 million more than the one done two days ago.

This is just a vague memory, but I read somewhere that Google uses several servers to process searches and their databases aren’t always synced up - hence, someone searching from one part of the world would get different results than someone from another, depending on which server they get assigned to.

Squink · January 19, 2004, 11:17pm

Sure JRR, the internet, and google are both dynamic systems, but I seriously doubt that short term fluctuations in content can be enough to affect the ratio of word frequencies by more than a few tenths of a percent. The raw numbers are in the millions and billions after all, ratios can’t change very fast up there.
Another way of looking at the problem would be grab a sample of packets, and try to sort their contents into catagories like mp3’s, spam, porn, requests for Cecil’s columns and the like. If you think of the internet as an information superhighway, that’d probably give you the “most correct” answer.
I did some digging for studies of that type, but came up dry on anything newer than the mid-nineties. Anyone know of a recent packet content study?

Ryle_Dup · January 20, 2004, 1:18am

Interesting little thing.

Searching on google for “the” you get 5.4 billion responses

Searching on google with strict porn filtering you get 27 million responses.

Topic		Replies	Views
How much of the Internet is porn? Factual Questions	15	1464	May 5, 2002
On searches and Internet traffic Cecil's Columns/Staff Reports	23	2846	October 12, 2005
Internet Porn Factual Questions	29	2547	June 8, 1999
Pornography on the Internet Great Debates	38	2224	December 8, 2000
Internet and Porn The BBQ Pit	63	3576	January 2, 2003

Just how much of the internet consists of porn anyway?

Related topics