What's The Dope On The Deep Web?

Bearing in mind the boards on/off problems with the search facility, I thought I’d just ask if Dopers have any links to relevant threads on this topic? Although I’d appreciate any explanations, I mainly wanted to know if it has any practical applications for the ‘average Joe’ computer user. Are we missing out on anything, or is its existence more for the technically savvy pc user?

It’s relevance to you has more to do with your usage needs, then your computer ability. Much of the deep web consists of acedemic, trade, and otherwise niche publications and papers that have relevance to very few people. The vast libraries are invaluable for people doing research and looking for extremely obscure and exact information for thesis, cutting edge research, or truly something-never-assembled-before applications, with proper format, attributions, and review information.

But for everybody else(99.9%) the shallow web contains most of the information they will ever imagine needing.

By the Deep Web and your post, I assume you are talking about the large amount of pages and data that are not indexed by major search engines. I have heard it jokingly called the Dark Matter of the Webiverse.

This information is often encrypted, secured, not linked to or just dynamically created pages. There are large parts of it that are potentially useful, but even the search engines giants are trying to figure out this puzzle. I believe Deep Web includes any information that is password protected. I know I would love access to Elias Sports.

There is also an ungodly amount of semi-hidden message board information that I know from PC terms is so hard to sift through that it is nearly useless. A common occurrence is finding a likely hit and getting a page that says you can sign up to see the full post and responses.

I used to have complete access to the Elias databases (one of the perks of working here), and you’re not missing nearly as much as you think you are.

There is a mind-boggling amount of data but the search systems are kind of useless. Well, they were in 2002.

This page is pretty useful as an introduction to the Deep Web and finding one’s way about it.

There is no single person for whom the entire Deep Web (or even a reasonable fraction of it) is relevant, but bits and pieces of the Deep Web are relevant to almost everyone. The inside of your web-based email account is part of the Deep Web, for instance: It’s on the web, but only people with your password (hopefully, only you) have access to it.

Then there are also things that are publicly accessible, but still not indexed by the major search engines (via politely requesting them not to). Until the recent changes, this board fell into that category.

Don’t forget the “Deep Web” also consists of pages not yet indext by Google. Google does a good job of quickly indexing main pages, but let’s say you have great information on say “Michael Phelps,” but it’s on page 20 of your website. Google has only indexed your site through page 13.

I have found each time Google passes my sites they index just a bit more, even though I do have a complete google map and index page for users.

The more popular your site is the more Google index’s it. So you have problems that a popular blog with information repeated on other blogs will be indexed and updated more and deeper, yet it provides the same information as another site.

This keeps Google busy. One can look up key words and see the same information is often presented in Google’s top 10 matches even though Google says it filters out duplicate content, it clearly hasn’t even come close to perfecting this. By generating random words sites can produce duplicate information and appear as different sites.

Also Google relies more on links than keywords for ranking. So even if your page on Michael Phelps is in Google’s index, it maybe on page 30 of Googles’s listing. More popular sites will link under the phrase “Michael Phelps” to pages that advertise speedos (for example). A link who’s anchor is Michael Phelps to an ad carries more weight than an article about Michael Phelps.

It’s like anything, there’s a lot of information out there but if no one knows about it how can they find it. This is why when you make webpages, especially for a business, it pays to use a VERY VERY unique name to get your pages shown.

That’s interesting and disappointing. Google used to be very good about crawling over everything listed in your sitemap.xml(as long as you didn’t have more than 10,000 URLs)and indexing it to their engine, I guess they got overwhelmed.