Anyway, my workplace uses websense to filter “objectionable” web content from our workplace computers. The IT department’s definition of “objectionable” is quite different from mine, but that’s another thread.
Websense blocks certain pages based on categories. For example, I can’t check my lottery numbers on illinoislottery.com because I get a blank screen with the message “The category ‘Gambling’ is blocked by Websense.” Ditto for msn.com’s relationship advice column (“Dating and Personals”); fantasy football reports (“Games”); my PS2 games review site (“Games” again); the website for the movie Saw (“Tasteless”); and, strangely enough, a militant pro-Zionist (or anti-Zionist; I was never able to find out) website that I stumbled on to while searching for information about Chanukkah (“Militancy and Extremism”).
How does a web filter know a page’s content? Do ISP’s put some kind of cookie or something on each page, to let the filters know what’s up? Like, some code that says “This page is about gambling.”?
Or do they look for keywords? Although I’d have to say that if it were keywords my workplace web filter would definitely block the SDMB, as the word “Dope” in the title would flag it for “Drugs.”
Keyword and content filtering is the most basic step. Many filtering packages also have actual humans who surf around looking for stuff to add to the block list. Sounds like an interesting job.
Look at the source HTML of a page and you’ll most likely find a METATAG section near the top. Those tags are “topical” to help search engines (or mislead them).
They can be used for “filtering” purposes.
At my job, sports and gambling sites are filtered out.
According to various stories I’ve seen on tech sites, keyword filtering is almost always the only step. The companies that create this software want you to think that all of it has been verified by a person, but it would seem that only sites that are allowed through and that get complained about ever get viewed by a real person there. …The last time I saw a story on the matter, it noted that (when they checked) the web-filtering software that many US public schools were using would not allow you to visit Dick Cheney’s website or visit the National Institute of Health website at all, because it frequently contained keywords concerning reproductive organs and sex.
~
I do know of at least one filtering company that had a whitelist process. If something was, IYHO, inappropriately blocked, you could press a button and an actual human would review it and remove it from the block list if it was innocent.
They are not very clever at all. They are just a database of banned sites in various categories. An employer can choose which categories to block. When I was doing security work I convinced management to unblock things like banking sites because it is quicker for staff to do internat banking than phone banking. The SDMB usually comes up as entertainment.