How is a search Engine programmed?

I am interested in knowing how a search engine works. How does it sift through millions of web pages to find the key words you enter?

blade

They have really already done the searching and all the resulting data is catalogued in a database. So when you search on “cheese” and “burger” it just looks up in its own local database for sites it has crawled through which match those key words.

The crawling is a little trickier, and I’m not certain how that works.

The database portion makes sense and I can see very easily how this is done, but I want to know how they even MAKE a database like this. How do they actually search for the words in the pages on the web?

blade

This site is packed with search engine information. At a very simple level:

A site like http://www.google.com/ uses more than 6,000 computers to store info so that it can search under 1 second for all you need.

I didn’t look but there might be some more info at howthingwork.com

The thing that’s most important to making search engines capable of indexing billions of web pages is the data structure used in the index. I don’t remember the details, but the index is (AFAIK) typically stored as specialized type of tree that allows searches to run in something like O(log n) time.

Re Google: The following link was very interesting information to the search engine optimisation community until about a year ago. About then, Brin and Page started to change the original algorithm configuration so now things are pretty different. However, it still offers more of an overview of the goals associated with building a search engine than any other document in the public domain (to my knowledge).

Bear in mind the Paper was written and, later, the whole concept of Google itself was brought to fruitition while Sergey Brin and Lawrence Page were still students at Stanford. Pretty awesome stuff:

http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

I believe handy is quite right. Google still works out of bank upon bank of almost regular pc’s. And lots and lots of cabling