View Full Version : How is a search Engine programmed?
02-05-2001, 04:16 PM
I am interested in knowing how a search engine works. How does it sift through millions of web pages to find the key words you enter?
02-05-2001, 05:01 PM
They have really already done the searching and all the resulting data is catalogued in a database. So when you search on "cheese" and "burger" it just looks up in its own local database for sites it has crawled through which match those key words.
The crawling is a little trickier, and I'm not certain how that works.
02-05-2001, 05:13 PM
The database portion makes sense and I can see very easily how this is done, but I want to know how they even MAKE a database like this. How do they actually search for the words in the pages on the web?
02-05-2001, 05:15 PM
This site (http://www.searchenginewatch.com/webmasters/work.html) is packed with search engine information. At a very simple level:
Search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.
Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated new information.
Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.
02-05-2001, 05:44 PM
A site like http://www.google.com/ uses more than 6,000 computers to store info so that it can search under 1 second for all you need.
I didn't look but there might be some more info at howthingwork.com
02-05-2001, 07:12 PM
The thing that's most important to making search engines capable of indexing billions of web pages is the data structure used in the index. I don't remember the details, but the index is (AFAIK) typically stored as specialized type of tree that allows searches to run in something like O(log n) time.
02-05-2001, 07:22 PM
Re Google: The following link was very interesting information to the search engine optimisation community until about a year ago. About then, Brin and Page started to change the original algorithm configuration so now things are pretty different. However, it still offers more of an overview of the goals associated with building a search engine than any other document in the public domain (to my knowledge).
Bear in mind the Paper was written and, later, the whole concept of Google itself was brought to fruitition while Sergey Brin and Lawrence Page were still students at Stanford. Pretty awesome stuff:
I believe handy is quite right. Google still works out of bank upon bank of almost regular pc's. And lots and lots of cabling
vBulletin® v3.7.3, Copyright ©2000-2013, Jelsoft Enterprises Ltd.