Deep Web Search Engines

I tried googlin’ for one, and unsurprisingly enough, came up with nothing. I have heard about the “wonders” of the deep web, and want to try it out. Any recommendations, or is it not worth the trouble?

I considered putting this in IMHO, but I do want factual answers and statistics. Our engineering lecturer said something along the lies of “google only contains 3% of the available content on the web.” Verification?

I have a link to a deep-web website in the references sticky. There was a book written on it and they published a website alongside it. It really hasn’t been updated since 2001 so some of the search links might not be accurate, however they accounted for this by simply providing links to the websites as well the website’s search engines. IMHO, I don’t recommend the book.

I also want to point out that by its nature the deep web has no search engines. See this quote from Wikipedia:

While you can be pointed towards websites that are members of the deep web, once they have been spidered by a search engine (search engines run by the very site notwithstanding) they are no longer part of the deep web!

In reference to your sig, “Choke a fish or kill a tree?” :cool:

Sorry - the website is http://www.invisible-web.net/

That’s what googling turned up, but it didn’t help me much.

But your post does make so much sense. I feel really dumb now. :smack:

In response to my sig, I don’t use paper bags either? It’s all backpack stuffing for me :slight_smile:

I take that back. I was at http://invisibleweb.com :smack:

That sucks. Damn those squatters!

There’s also the matter that most of the deep internet is specialized information. For instance, a researcher might have his computer hooked up to the Internet so he can run his simulations remotely, using a password. That computer is then part of the deep internet. But only people who have a reason to use it can access it.

From everything I have read, including the book above, it only applies to publicly accessible web sites that a search engine could not possibly crawl because the pages are generated dynamicly and simply do not exist in static form - thereby making themselves invisible or “deep”.

To requote Wikipedia:

A definition given by Berkeley seems to imply the same - pages that are searchable by the end user:

Although it could be said that your researcher’s password protected databases are searchable via the web, if one has certain credentials, it seems to go against the general “spirit” of the deep web. It should also be said that the entire reason the concept was named was not that there were password protected databases out there, but that search engines simply didn’t have the capability of crawling them, whereas if the end user was at the site they certainly could.

So the specialized databases that you generally have to have a membership to use (usually through a university or business), Lexis-Nexis being the best known (IME), wouldn’t count?

That’s what I immediately thought of when alterego’s first reply.

I personally don’t think they apply. I have access to hundreds of databases online through my university but they aren’t public, they are private. Yes they are ‘invisible’ or ‘deep’ but not for the same reasons that the term “Deep Web” was coined.

Maybe someone else has some ideas about it.

OK, then, a misunderstanding of terminology. Is there some term for the restricted portion of the Internet? Because it seems to me that that would be even larger than the public deep web.

Also, the quote

is perhaps a bit misleading, since the search results pages themselves are dynamically generated pages built from a database, and would thus be part of the deep web. So seach engines can, in fact, create some of the deep web pages.

That’s an interesting idea. Technically speaking the exact pages that Google spits out do not exist in static form. On the other hand they are just dynamic representations of static data that we already have access to.

It still sounds like they apply, which probably makes places like archive.org with their petabyte boxes and Google some of the biggest sources of the deep web themselves.

“I am my greatest problem” :slight_smile:

I don’t know of a term for what you ask. Probably “private web” vice “public web” would work, though that could get confused with a VPN.

FYI, there’s an article (free registration required) about this issue in today’s New York Times. It discusses how even academics tend to use only online sources to do research and how librarians are responding to this.