As I mentioned in another post, the proxy server here where I work blocks any site that is categorized as adult. My question is, what determines that? Do most of these softwares actually scan the content looking for certain words, or do they go by some sort of rating, etc.? Some things that should be “adult” it misses and some things that don’t seem so bad get blocked. It even blocks the state lottery page, I assume because it is a “gambling” site. Any insite into the way this stuff works? I do most of my posting at work, as I work midnight shift and it is very quiet here at night, and it can be startling to click on an innocent seeming link and be told it is blocked for “adult content.”
There’s a great deal of criticism about programs like net-nanny and Cyberpatrol. This is from one of the two hackers who took Cyberpatrol apart and decrypted the blocked sites list. Entire faq here.
These programs generally suck ass and most have a strong right-wing bias. Their lists are a combination of automated searches and what sites parents tend to complain about.
There are three basic techniques:
Blocking bad stuff. There are people out there who get paid to look for sites with bad stuff on them. Cool work if you can get it, huh? They make a list of the bad sites, and keep it updated. They sell that list to firewll vendors, who block access to it. The list is typically categorized, so the administrator can selectively block stuff. Typical categories are sex, violence, drugs, hate groups, etc.
Allowing good stuff. The same companies make lists of good sites. “Good” meaning for children. Firewalls can then allow only stuff permitted by the good list. This technique isn’t typically used in corporations, since the web expands so quickly. It’s more typically used in consumer products so parents can allow their children access to disney-like stuff only.
Filtering bad words. A firewall can contain a list of bad words and drop or modify transmissions containing those words. This technique isn’t as widely used.
There are a few ways in which screening is done:
-
Some services look at the keywords in the <meta> tags on web pages, via different 'bots.
To see what I mean (assuming you’re using Internet Explorer), go to the View menu and select Source. Near the top of the document you should see several lines of code starting with <meta>. One might be <meta name=“keyword” content="…">
Most search engines use these to classify pages. -
There are also web rating companies, such as RTSC, where webmasters may submit their sites. After review, the site is given some HTML code (usually a <meta> tag)to insert in its page that indicates a rating. Some browsers (IE5, for example) allow users to block the ratings of their choosing.
In IE5, go to Tools --> Internet Options and play around with the Content tab.
Problems: With IE5’s Content Advisor, if a site has no ratings whatsoever, it may well be blocked out, however innocuous it may be.
There’s no standardization amongst the various screening services about how or what is blocked.
No one makes adult sites submit their content for rating. Most major sites do, just because it’s good for business and avoids legal hassles. Don’t expect such thoughtfulness from the fly-by-night or amateur sites.