The wonders of Google

You have to pay Google to ask it a question, so I’m being cheap and asking you guys.

Why does it take Google 1.3 seconds to search billions of web pages for the term “cecil”, while my brand-new 2 gighertz computer takes 5 minutes to do the same thing through my puny hard drive? I understand Google somehow indexes these webpages, but even then it’s a wonder how it does it so fast. What gives?

Here’s Google’s explanation of their technology:

Thank you. I tried to find a page like that but all I could come up with was a page explaining how google uses a pigeon and it apparently was a joke or something. I still don’t understand how completes searches so fast, even with the thousands of pc’s connected. Thanks again though.

In short, Google is not searching billions of web pages, but their index of billions of web pages. They keep track of which pages contain which terms, and which pages link to those pages (to judge popularity associated with a given term.) They have multiple server farms with thousands of machines for searching the index.

On the other hand, when your computer is searching documents on your hard drive, it has to go through each and every document to see what’s in it. Newer operating systems have indexing features built in to speed up this sort of thing.

It’s basically like the difference between looking something up in the index of an encyclopedia and going right to that page vs. flipping through the entire encyclopedia page by page and looking at every word.

The “index” that a computer uses is called a “search tree.” A search tree has “branches” which then end in “leaves” where the data is contained. Google searches on words, so a simple word search tree might be constructed just using each letter of the alphabet. You start off with 26 branches, one for each letter of the alphabet. Then each branch has 26 branches under it, and so forth. Branches that don’t end in any leaves (for example the ZQ branch probably doesn’t end in any words) are just eliminated from the tree structure so they don’t take up space.

In your example, this type of search tree would first go to the C branch, then the E branch under that, then the C branch under that, etc. until it got to the L and it would see if a leaf was present there. If so, it would list out everything in that leaf. If there was no leaf there it would return not found. This means that it can find the word with 5 simple comparisons, instead of doing a few million comparisons by searching the entire disk drive.

This type of search tree probably wouldn’t be all that practical for an internet search engine, but it should give you the basic idea of how they work so fast. The one thing google isn’t going to tell you is exactly what type of search tree they set up, because that is basically the technology that they do in fact sell. Their internet search engine is just a free demo of their technology.

A google search (heh) on “search tree” should give you a bit of reading material. The data structures and search algorithms can get fairly complex.

Um, you pay to use google?

You have to pay to use Google Answers: Google Answers.

Google Answers allows you to ask an expert a specific question, and then they… well, they answer you. :slight_smile: It appears to be a manual operation, rather than an automated search.

dantheman, That’s the service I referred to in a subtly sarcastic way.

BTW, if you’re running Windows 2000 or XP, you can run the Indexing Service and get similar performance in searching for documents on your machine.

I know, aeropl. I was answering Bongmaster.

You might be interested in this page. It explains Google’s system of PageRankings and the problems associated with it.

Quoth engineer_comp_geek:

You’d be surprised at what you can get. I rather doubt that there are any three-letter strings which don’t appear somewhere in Google’s database: There are a lot of TLAs out there, after all.

Thanks Starbury. That was interesting.