Robots and search engines

This is a follow-up question to the thread about which pages get hit by a web search.

I think I now understand that a robot typically scours the web for pages and compiles a ‘list’. This ‘list’ is then scanned by a search engine whenever a query is submitted. If I am not mistaken, this second step doesn’t take too long, often less than a second.

My question, then, is how long does it take the robot to tour the web? Especially if the servers it’s visiting are on slow networks, wouldn’t this take the robot ages? I guess not given that they’re out there. But dow long does it take?

Thanks.

No robot has looked at the entire web because it would take a really, really, really, really, really long time. Also, the web is constantly changing. New pages are put up, old ones are taken down or moved. The content on an existing page may change every minute.

Generally a robot starts with a list of several thousand known sites and starts up a process for each site. When it finds a link, it starts a seperate process for each link until a maximum number of concurrent processes (say 500,000) are running. Since all these processes are running in parallell, it doesn’t matter if the networks are slow, some process somewhere is always going to be fetching an indexing pages.