How is the Internet Censored? How are Websites Rated and Prioritized?

Ever since Google decided to censor website searches in China, to block out any info on topics such the Tiananmen Square massacre and their involvement in Tibet, I’ve been curious about how my searches are affected by those who created the search engine, and in turn by those in power.

There are quite a few videos on the topic on Youtube. I found the one below particularly interesting, since it deals directly with Google censorship in America. Google comes out with it’s version of the ‘Top Internet Searches’, which apparently isn’t QUITE accurate.

Warning! There are some naughty words in this video!

Do you know anything about how we are directed to certain sites when we do searches online? Is it fair? And are search engines censoring what they tell us about what people are searching for?

Censorship and search prioritization are two very different things. There are all sorts of algorithms that determine search results order, and you can buy your way to the top of the list, but most search engines make it clear that certain links are sponsored.

A common way to determine relevance is to look at the number of sites that contain links to the site in question with that term. If there are lots of links to the Wikipedia site for a particular term, then that link should show up high on the list. There are other algorithms that weight the links in a variety of ways.

But each search engine has its own algorithm and they don’t come up with the same results. Censorship only occurs when a government entity (or possibly a monopoly, but that’s not really accurate) prevents you from getting access to something. China can do that (although not completely) by funneling all traffic through government routes, blocking access to anything it doesn’t like. Search engine results sorting is really nothing like that.

As to whether it’s fair, there’s no way to really answer that until you tell us what fair means to you. You can always create your own search engine to implement that algorithm.

I just watched the first 3 minutes of that link - it’s garbage. They guy doesn’t really have much of a point and it was awful to listen to. I gave up.

True censorship is done through a series of things, like IP banning or simply whitelisting. Whitelisted sites are those that are OK’d. In otherwords EVERYTHING is blocked and then one by one a site is submitted and approved. If OK’d (is that a word?) then it’s allowed. These sites undergo periodic review to make sure their content isn’t changed after they are OK’d (I guess it’s a word now:) )

Proxy servers can’t get around a whitelist. Some nations ban all proxy servers, so even if your uses is legit, you can’t use it. Some nations ban all sites from that have a certain countries web address (such as Israel’s .il (top level domain). Others will allow the .il domain so long as the IP doesn’t go back to Israel. Of course their are ways around that. Other nations like Iran make the ISP responsible for making sure content doesn’t slip through.

Mirrors can get around blocks. For instance I have a site, well call it “Markxxx.Com” (not real) it mirrors to JoeBlow.Com on a different webhost. When I update one site the other site is automatically updated. But if Markxxx.Com is blacklisted, my JoeBlow.Com site probably won’t be. 'Cause people won’t know about it, unless I tell them.

There is “effective censorship” because most people never look beyond the second page of Google results. Google is an “effective” monopoly. (This means they function like a monopoly but aren’t because they field is open, it’s just hard to break into. Another example would be eBay.)

Google bases it’s results on links. Links count differently. For instance a link FROM a top ranked site, like USAToday can be worth 10,000 points. While 1,000,000 million links from sites not ranked or bottom ranked, may be only worth 100 points total.

So it’s not only the number of links, it’s the QUALITY of those links as well. Also it’s how the links are structured. If USAToday says “no follow” in their links, any links from USA Today won’t count in Google, though they of course, still link the readers of USA Today to that site.

So you see a powerful site like USA today that’s top ranked, can control how other links are viewed. And they though use of the scripting of their webpages can control things.

SEO (Search Engine Optimization) is used to get your site to the top of the searches but can be abused and spammed. And Google often doesn’t follow it’s own stated rules. For instance, Google is supposed to filter duplicate content out, but clearly as one can see from all the sites that scrape Wikipedia, this is false. I can look up a term and come up with 7 sites in the top 10 that are scraping the Wikipedia and providing the same info.

I don’t think that is as much of a matter of censorship as fairness.

For great reading on SEO try Search Engine Watch (dot) Com

Chinas system works via keywords and the temporary banning of IP addresses. Essentailly the firewall is a big IDS that listens for the proper words, then bans the site for x amount of hours. Its very dyanmic. Google, et al, dont need to do anything, the system will see the word Tibet and just block you. Its far from perfect, but it kinda works.

There was an article in 2600 magazine about this and examples from someone in China.