The Google search engine is based on measuring the value of a web site by the number of other sites linking to it and their values. Google’s founders came up with an efficient algorithm for evaluating these values (it is complicated because each site’s value depends on all the other site’s values, so none of them can be calculated first).
Trouble is, Google isn’t an innocent bystander in all this. So many people use Google that they MUST be influencing how successful sites are, and therefore indirectly influencing how many other sites link to them. Because they are influencing the human choices site managers and users make while putting content into their sites, the exact role Google played in helping a site get linked to isn’t explicitly quantifiable. Therefore they have become a somewhat inscrutable positive feedback node in this system they purport to measure.
Is there some mechanism by which they deal with this?
I never looked in depth at the PageRank algorithm, but just understood it at a general, talk-about-it amonst grad student level. But you piqued my curiosity, so I looked into it a little. (How much experience do you have with algorithms and/or how much detail are you looking for?)
It looks like it’s mostly the damping factor (i.e., diminishing returns) of the ranking weight, in addition to some manipulation of link value (treating inbound/outbound links separately, the “nofollow” tag, penalties for certain links, etc.). Of course, they keep all the specifics secret, so it’s difficult to answer (or even get an answer).
It counts number of other sites that link to a site.
It’s not going to count it’s own search results as a link to another site, so I’m not sure how that’s applicable.
People are able to find sites they like using google, and this means that at some point on their own web page they will make a link to this page, thereby increasing it’s rank in google.
So? The only effect this will have is to make sites that are already high in ranking be more so, and sites that are already low in rank be lower. But it will have no effect at all on rank order which is how google lists the sites. Actual rank may change but doesn’t matter - what matters is rank order, which doesn’t change.
Are you talking about the fact that people try to figure out how they can design their site to make google like it better regardless of how many links go to it? Things like making sure keywords are near the top of the page and not hidden by flash, and proper metadata? In this case there is no google counterbalance. If you don’t do what google likes you are screwed.
Except that positive feedback loops amplify everything, including random noise. Suppose, for instance, that some unexpected newsworthy event occurs (say, the assassination of Lindsey Lohan). Within minutes after the event, both ABC and NBC have published stories about it. ABC has better coverage of it, but by fluke, it happens that 7 people on blogs link to the NBC version of the story, but only 5 people on blogs link to the ABC version. Such flukes are pretty common, with such low numbers, and there’s guaranteed to be some point very shortly after the event when the numbers are that low.
Now, Google scans the blogs, and sees seven people linking to the NBC story with the words “Lindsey Lohan killed”, but only five people link to the ABC story with those words. So it gives the NBC story a higher rank for the search terms “Lindsey Lohan killed”, even though the ABC story is better.
Now suppose that I hear a rumor about this, and wonder if it’s true. I Google “Lindsey Lohan killed”, and see the first hit, from NBC. I could read the second hit, if I wanted to, but I don’t bother, since I’ve already read the first one, so I don’t realize the second one’s better. I then start a new thread on a message board about it, and link to the NBC story (since that’s the one I read). Now there are 8 links out there to the NBC story, and still only 5 to the ABC story. NBC is growing its lead, and for no reason other than random noise amplified by the positive feedback loop.
I don’t think this is quite right. That is, the formula for calculating the PageRank uses the PageRank of the linking pages. So, in some sense, Google does use their own search results to determine the final PageRank via iterative calculation (see the Wikipedia page I linked above).
Although I have to admit that I don’t have an example where repeated iterations lead to a changed ranking. Perhaps you’re correct that it won’t make a difference, but without a proof to that effect, I’d not rule it out.
Well, Google would argue that the fact that more people are linking to the NBC story (in addition to the fact that the people linking to NBC are themselves more linked to and therefore more reliable critics) means that the NBC story is better (or at least more likely to be better since NBC is more popular). And, ultimately, what google is aiming to give you is not strictly ‘the best source for your search term’ but rather 'the most likely article that people using those search terms are looking for.
With regards to random noise, I would argue that if the noise is able to be amplified so much that people are just linking to the first google link that they find, and they aren’t considering anything else, then probably a page lower in rank isn’t that much better that it matters.
>Now, Google scans the blogs, and sees seven people linking to the NBC story…
This is the effect I’m wondering about. Chronos said it more clearly than I. And I didn’t mean to suggest Google’s own links (though that might be interesting too).
>Well, Google would argue that the fact that more people are linking to the NBC story (…) means that the NBC story is better…
And this would be the error associated with the effect I’m wondering about. I think by the construction of this example that, logically, the statement above is equivalent to “Google would argue that the fact that Google made more people aware of the NBC story means that the NBC story is better…”, which is obviously a circular argument.
>Are you talking about the fact that people try to figure out how they can design their site to make google like it better regardless of how many links go to it?
No. I mention the “human choice” issue because I think people looking for information use a variety of clues in their choices, and details of the mix of clues are buried inside human users where they aren’t independently measureable by designers of the search engine. Therefore it looks to me like they don’t have an obvious way of testing how big Google’s feedback loop is.
Though, as I think about it, one method comes to mind. They could randomly inflate some rankings and deflate others, remembering what this random influence was, and then evaluate how the inflated and deflated sites compete in later link evaluations. This would only be a statistical trend, of course, but that is enough to gage the magnitude of their feedback gain. In fact, they could also deliberately derate a site based on their own history of ranking it highly, so that they could null out their feedback.
So, I think it’s an interesting question. Thanks for all who are pondering it!
As to the OP. Search Engine Optimization (SEO) companies regularly try to exploit any feedback holes in Google’s system to increase the pagerank of their customers. Google fights back. If there were an easily exploitable hole, the SEOs would find it, people would notice, Google would find out and fix it.
It is a far from perfect system but it’s better than other companies and that’s how they got where they are despite their late start.
Remember links are not equal in value. For example, the New York Times has a high page rank, one link FROM the New York Times can be equal to thousands of links from sites that have a lower page rank.
I’ve seen sites I’ve made jump from page 5 of Google to the first page simply by having a link in the Chicago Tribune, where I only have two or three additional links TO my site.
This is a place where Google is often critisized for, it favours the pages that are already there and high up in page link
Also remember Page Rank doesn’t increase in value as you go up proportionally. Only Google knows HOW much difference is between each level of page rank. One thing for sure it isn’t a simple increase
For instance,
If we assign the number 100 to a page with Page Rank 9, the next page rank (rank 8) might have a value of 300. The next page rank, page rank 7 may have a value of 900. By the time you get to page 1 page rank you could be in the tens of thousands.
This is true. Google’s algorithms are designed to maximize the relevance of links. By adding another link to the NBC story, you’ve provided more evidence to Google that it’s rankings were correct. It’s well known in the industry that you want to be on the first page of their search results. If you’re high enough, you tend to stay that high, there is a feedback loop.
SEM (the paid links) is interesting also. The SEM links cost different rates depending on competition and relevance. If you buy a link to the SDMB for the search term “Breaking News”, it will cost you more than for CNN to buy based on the same link, because Google will soon realize from their users behavior that SDMB is not as relevant as CNN. If it’s severe enough, they’ll pull down the link no matter how much you’re willing to pay.