Web Search Coverage?

I’ve read that typical search engines cover maybe 20% of the web. If that’s true, what is the missing 80%, and how can you get at it?

Perhaps it’s mostly small independant pages or private stuff? And maybe often when we follow links to links to links we’re off the beaten path?

Lastly, any good suggestions for finding sites that are away from the mainstream? I’ve been really enjoying weblogs/blogs recently, and that’s one way.

I don’t have exact figures, but most search engines only index pages that are submitted to them. I believe there are projects intended to manually index as many pages as possible, submitted or not, but that’s a Herculean task and perhaps impossible given the ever-increasing number of sites and pages. As for how to find these other pages…well, you’re right in that following links from other pages is the best way. Randomly entering URLs would be another, but you’re not likely to get much from an awful lot of typing!

For non-mainstream pages, weblogs are probably a good choice (I often use Robot Wisdom). I also check out Yahoo’s picks of the week every Monday – I would post an address, but I only have the England one to hand.

From Search Engine Watch:

(from this article)

They also quote a study by a company called BrightPlanet that estimates that there are around 500 billion web pages, with only 1/500 accessible to search engines.