It was originally devised as a way to annotate the Internet: A technology that would allow people to leave comments about webpages. But they realized that some comments are worth more than others, and that they’d need some way to filter out the junk. What they came up with evolved into PageRank: A webapge’s rank depends on the number and the rank of the pages linking to it.
[rant on how great this accident was]
With PageRank, Google beat out the other two paradigms of searching the Internet: Human-created directories and simple text matching.
Human-created directories would make a lot of sense if the Internet were as slow-growing as ARPANet was: A few new sites a day, maybe, and all of them registering in a standard way with one of, maybe, a dozen different organizations. Humans cannot cope with the Internet as it stands now, with thousands of new pages added a day and very little in the way of standardization.
Simple text matching works as long as everyone is honest: If the Internet were limited to universities and large organizations, even if it were growing like kudzu everyone could be counted on not to fill their pages with crap just to attract surfers. But any idiot can create a webpage, and many do. More than a few create webpages that are full of crap words and phrases designed to trick naive text matching spiders.
PageRank matches the intelligence of humans with the speed of computers by relying on people not to link to crap, and using that as the basis of a fast algorithm that traces down back-links (Google was originally called BackRub because it did that, in fact) to see who links to whom. Voila: Something that naturally filters crap while being as fast as a good algorithm.
All of this is based on a fundamental discovery: Human time is expensive, machine time is cheap, so spread out the work that has to be done by humans (filtering) over as many humans as you can, so you can rely on a few machines to do the rest.