Borderline surface web sites

correlophus · November 17, 2016, 12:42am

Now I am reading about the Deep Web, how it is accessed, and if the Surface Web has places that have some characteristics of the Deep Web. The author of the blog characterizes some sites as borderline surface web, that is, Google can index them with difficulty and anonimity is prevalent. He names as examples reddit and 4chan. Are there any sites that are officially called borderline websites? Can we compare some places of the surface web to the deep web? I know reddit is a well-known website. Is it considered deep?

snfaulkner · November 17, 2016, 1:00am

What blog are you reading?

LSLGuy · November 17, 2016, 1:47am

Deep, surface, and borderline means whatever the blog author wants them to mean. They’re buzzwords, not internet engineering terms.

TruCelt · November 17, 2016, 3:42am

reddit and 4chan are more analogous to gateway drugs. They are where you would gain the right knowledge, meet the right people, and get into the wrong crowd, all of which could end in an invitation to try some site you’d never find on your own.

Or you could just use a Deep Web or Dark Web search engine . . .

Chronos · November 17, 2016, 2:41pm

You already use parts of the Deep Web, every day. When you check your bank account balance, or read your e-mail, or edit your Google Docs, you’re viewing information over the Web which is not available to the general public. And yet, your bank makes no secret of the fact that they have accounts, and has information prominently available on their public web page about how to get one of your own.

“Deep” does not in any way mean that there’s anything shady going on. There will be some shady stuff in the Deep Web, of course, but then, there will be some in the public Web, too. It might even be safer on the public web, because not requiring login credentials might make it easier to maintain anonymity.

aldiboronti · November 17, 2016, 2:52pm

Yes, according to a recent survey much of the content on the Deep Web is quite innocuous.

If you think the dark web is nothing more than a wretched hive of scum and villainy, think again – research has shown that the majority of content hosted on it is perfectly legal.

A new report from security firm Terbian Labs reveals that while most people associate the dark web with questionable pornography, exotic narcotics and unlicensed arms deals, the reality is actually quite dull, with over 50% of all domains and URLs in the survey’s sample comprised of legal content.

“These Tor Hidden Services play host to Facebook, European graphic design firms, Scandinavian political parties, personal blogs about security, and forums to discuss privacy, technology, even erectile dysfunction,” the report explains. “Anonymity does not equate criminality, merely a desire for privacy.”

watchwolf49 · November 17, 2016, 2:53pm

Maybe we’re thinking of the Dark Web … bitcoins, hackers, those Guy Fawkes people … all those nasty places on the web.

Deep Web is stuff that’s just buried in layer after layer of directories … think of a specific latitude and longitude on Mars and we’d have to be digging through NASA’s web site to find a photo of that specific place … it’s there … but it’s deep in the site … many government agencies post their scientific data they’ve collected, it’s just difficult to find in some cases.

Carryon · November 17, 2016, 3:26pm

This would never have happened with Gopher

correlophus · November 17, 2016, 6:27pm

Ok, I conflated deep web and dark web, as usual. I was specifically meaning the tor network, and if some surface sites are somewhat similar to that. The specific article in a blog I read that is below:

74westy · November 17, 2016, 7:42pm

Over 14,000 words and not a single paragraph break.

snfaulkner · November 17, 2016, 10:24pm

I got as far as “1EarthUnited” and dismissed the entire thing as rubbish.

aldiboronti · November 17, 2016, 10:26pm

Right, I confused the Deep Web and the Dark Web in my post above. Here’s a definition from PC Advisor:

The survey I linked covered the Dark Web.

quimper · November 17, 2016, 11:22pm

Cmon man, no tables, frames or inline image support? Mosaic kicked Gopher’s ass fair and square.

dracoi · November 18, 2016, 1:23am

I don’t know if this is a helpful thought, but the SDMB is a perfect example of surface and deep content side by side. This post is now public information and searchable by the likes of Google. But if I sent you a private message, that is part of the deep web, only accessible with your username and password. And if I use the Tor browser to read the SDMB… well, I haven’t changed the SDMB at all, but I am using dark web technology to encrypt data and mask IPs.

correlophus · November 18, 2016, 10:16am

And I read it whole!

Terminus_Est · November 18, 2016, 10:32am

Characterizing Reddit and even 4Chan as “borderline” speaks of a misunderstanding of the nature of those forums. If the primary characteristic of borderline is that Google can’t index them, then the SDMB would have been borderline before Google was allowed to index us. It’s really to block Google and other legitimate webcrawlers as that just requires a couple of lines in the robots.txt file.

LSLGuy · November 18, 2016, 1:44pm

[Bracketing] inserted by me to clarify.

Agree with all you’ve said. Which makes me think of a question …

Ref the snippet above, compliance with robots.txt is 100% voluntary. An interesting question is whether there are any publicly available search engines that advertise they don’t abide by robots.txt?

Sure, any given webmaster could try to IP-block such an unfriendly search engine. But that’s a futile game of whack-a-mole versus any good-sized crawler infrastructure.

Derleth · November 18, 2016, 2:14pm

Eh, you can serve HTML over Gopher just as easily as anything else. Images, too.

watchwolf49 · November 18, 2016, 2:15pm

Wow … Mosaic … seeing that word makes me feel very very old … do you Alta Vista ???

Derleth · November 18, 2016, 2:25pm

No. That would be not only blatantly antisocial, but monumentally stupid from the perspective of the robot’s operator.

A lot of what robots.txt does these days is protect website backends from robots and, therefore, robots from themselves, in the form of notifying robots about dynamically-generated content which can be effectively infinite, generated programmatically from whatever internal database the website draws from; unless the robot’s owner wants to be on the wrong end of a combinatorial explosion, it programs the robot to respect robots.txt and avoid some infinite tarpits that machines really cannot navigate.

Well, here you get into the difference between what semi-legitimate but assholish people do and what spammers with hordes of zombies do. Sure, a good-sized search engine company might own a lot of different computers sitting behind a lot of different IP addresses, but since it will have leased those computers legally from one or two other companies, or will own them themselves, all of those IP addresses will be in a few specific netblocks, owned by the relevant companies, as recorded in the information associated with the Autonomous Systems which advertise those netblocks as their own. In short, all of those IP addresses will be coming from “the same place”, in a networking sense, and it will be easy to block all of them with a few commands.

The spammers don’t do that. They own zombies, created through foul magicks involving unpatched Windows XP machines sitting behind cable modems, and therefore their IP addresses could come from anywhere on Earth. Blocking them is more of a game of whack-a-mole, but programming your server software to rate-limit any specific IP address which tries to go too fast, or tries to grab the wrong things, is a lot easier.

Topic		Replies	Views
Tell me about the 'Deep Web'. Factual Questions	25	4682	February 28, 2014
What is the "Deep Web" and how can I view it? Factual Questions	18	2818	January 14, 2005
Tell me about Tor / the deep web Factual Questions	10	6538	August 15, 2013
Deep Web question Factual Questions	1	851	January 3, 2003
Deep Web Search Engines Factual Questions	12	2187	June 21, 2004

Borderline surface web sites

Related topics