The wonders of Google

aeropl · November 5, 2002, 3:54am

You have to pay Google to ask it a question, so I’m being cheap and asking you guys.

Why does it take Google 1.3 seconds to search billions of web pages for the term “cecil”, while my brand-new 2 gighertz computer takes 5 minutes to do the same thing through my puny hard drive? I understand Google somehow indexes these webpages, but even then it’s a wonder how it does it so fast. What gives?

DarrenS · November 5, 2002, 4:29am

Here’s Google’s explanation of their technology:

aeropl · November 5, 2002, 4:34am

Thank you. I tried to find a page like that but all I could come up with was a page explaining how google uses a pigeon and it apparently was a joke or something. I still don’t understand how completes searches so fast, even with the thousands of pc’s connected. Thanks again though.

friedo · November 5, 2002, 5:35am

In short, Google is not searching billions of web pages, but their index of billions of web pages. They keep track of which pages contain which terms, and which pages link to those pages (to judge popularity associated with a given term.) They have multiple server farms with thousands of machines for searching the index.

On the other hand, when your computer is searching documents on your hard drive, it has to go through each and every document to see what’s in it. Newer operating systems have indexing features built in to speed up this sort of thing.

engineer_comp_geek · November 5, 2002, 2:23pm

It’s basically like the difference between looking something up in the index of an encyclopedia and going right to that page vs. flipping through the entire encyclopedia page by page and looking at every word.

The “index” that a computer uses is called a “search tree.” A search tree has “branches” which then end in “leaves” where the data is contained. Google searches on words, so a simple word search tree might be constructed just using each letter of the alphabet. You start off with 26 branches, one for each letter of the alphabet. Then each branch has 26 branches under it, and so forth. Branches that don’t end in any leaves (for example the ZQ branch probably doesn’t end in any words) are just eliminated from the tree structure so they don’t take up space.

In your example, this type of search tree would first go to the C branch, then the E branch under that, then the C branch under that, etc. until it got to the L and it would see if a leaf was present there. If so, it would list out everything in that leaf. If there was no leaf there it would return not found. This means that it can find the word with 5 simple comparisons, instead of doing a few million comparisons by searching the entire disk drive.

This type of search tree probably wouldn’t be all that practical for an internet search engine, but it should give you the basic idea of how they work so fast. The one thing google isn’t going to tell you is exactly what type of search tree they set up, because that is basically the technology that they do in fact sell. Their internet search engine is just a free demo of their technology.

A google search (heh) on “search tree” should give you a bit of reading material. The data structures and search algorithms can get fairly complex.

Bongmaster · November 5, 2002, 4:35pm

Um, you pay to use google?

dantheman · November 5, 2002, 5:17pm

You have to pay to use Google Answers: Google Answers.

Google Answers allows you to ask an expert a specific question, and then they… well, they answer you. It appears to be a manual operation, rather than an automated search.

aeropl · November 5, 2002, 8:19pm

dantheman, That’s the service I referred to in a subtly sarcastic way.

Cerowyn · November 5, 2002, 8:39pm

BTW, if you’re running Windows 2000 or XP, you can run the Indexing Service and get similar performance in searching for documents on your machine.

dantheman · November 6, 2002, 1:30pm

I know, aeropl. I was answering Bongmaster.

Starbury · November 6, 2002, 5:33pm

You might be interested in this page. It explains Google’s system of PageRankings and the problems associated with it.

Chronos · November 6, 2002, 6:04pm

Quoth engineer_comp_geek:

You’d be surprised at what you can get. I rather doubt that there are any three-letter strings which don’t appear somewhere in Google’s database: There are a lot of TLAs out there, after all.

shelbo · November 6, 2002, 6:23pm

Thanks Starbury. That was interesting.

Topic		Replies	Views
How does Google work? Factual Questions	4	802	December 2, 2002
How does Google work so fast? Factual Questions	29	5044	November 11, 2006
Google search on massive database? Factual Questions	4	895	February 28, 2005
How do they do it? Factual Questions	3	770	August 21, 2004
Google? What's so good about it? Factual Questions	28	2297	December 13, 2000

The wonders of Google

Related topics