Where is the data on the dark web stored?

Motivated by Cecil’s recent column:
Link

Specifically, what servers store the data from the hidden wiki/silk road etc.
Someone must be providing a facility to store this stuff.
The Tor browser knows how to get there, how does it do it?
I get that the user’s connection enters the network, is bounced around for a bit, and then exits at the ultimate destination, but surely the browser has to know what that destination is. Can’t the authorities work out where these websites are coming from?

Nobody knows. It could be in Bangladesh, it could be next door.

True.

By asking a computer which doesn’t know, but can ask another computer, which can ask another, and so on, until, finally, it reaches the right one.

No. As I alluded to above, the whole point of Tor is onion routing, which works like this:

It’s like going from Chicago to New York City via Dallas, Los Angeles, Seattle, and Miami to cover your tracks, picking a different identity and a new credit card for each flight. In addition, the data itself (the web page, in this case) is encrypted, so someone monitoring your connection can’t tell what you’re doing.

If they could, they’d be shut down by now.

I’ve heard that some of the sites are distributed, so the whole doesn’t come from any single server (and if one server is taken down another can pick up the slack).

ETA: Not that this is particularly uncommon in the web world. Hell, Youtube is distributed across multiple servers, it’s just that for Youtube it’s load management more than obfuscation.

Silkroad is hosted on a normal web server somewhere, just like all websites. The difference is it is not serving pages over HTTP protocol on port 80 like normal websites, instead it only serves pages over the .tor protocol which obfuscates both ends of the transaction so not only are you not able to know the servers IP it also cannot see yours.

There is a weakness in this though, if enough nodes on the .tor network were set up by the DEA they might be able to use traffic analysis to start getting an idea of where Silkroad is hosted. One also wonders if the data center where it is hosted could figure it out, they would have a web server with an incredible amount of traffic mostly to .tor nodes. Even if the disk is totally encrypted the amount of traffic only to .tor nodes could be suspicious.

EDIT:This assumes Silkroad is one server, it very well could be several.

Let’s continue this analogue if possible. In the above I would still need to know that I am going to NYC, and I need to know where NYC is. Can you explain how I obtain this information.

[small hijack]

Is this really a new column by Cecil? Cause I know I’d read all about this (Dark Web) before and I could have sworn it was his column. Maybe it was a post here a while back? :confused:

I vaguely recall reading about it on this site (or from a link I got here?), but don’t recall this specific column. It must have been something else.

Not a computer expert, just continuing the analogy. Why would you need to know where NYC is? You just need to know how to find an airplane going to your next destination. The pilot needs to know where the destination is, but he doesn’t need to know that you’re on the plane or what your business is when you arrive at the destination. Ultimately, you board a flight bound for NYC. The pilot knows where NYC is, but he doesn’t know why you want to go there.

Broadly speaking, .onion sites run their services bound to localhost, and proxy those services through a plain old Tor stack that’s remarkably similar to the one you use as a client. There are a few other steps related to registering the service “address” with the shared directory service, but since everybody’s speaking Tor, nobody knows for sure which request is coming from an endpoint and which is coming from a relay.

You may be thinking of this thread:

Would there really be “an incredible amount of traffic”? Most people have never heard of Silk Road, and I should imagine that most people who have heard of it (such as me - admittedly I only learned of it from another thread on this board, last year) don’t use it (and probably don’t know how to reach it). Even most of those who do use it probably only do so occasionally, as is the case with legitimate e-commerce sites.

I think that was what I recall.

I believe that to the data centre, a Tor hidden service would look just like a regular Tor node.

With some further work the operators could distribute parts of the service amongst several data centres.

There’s a link in my first post.

I don’t think there is a pilot in this analogy. Unless you consider each node along the way to be a pilot, but personally I think that stretches the analogy.

Let me hazard a guess:
I need to get to destination A.
I randomly pick a node B and ask are you A or do you know A. No, try C.
Ask C the same questions, then D, E, F … until some node says yes.

This is clearly wrong because it does not guarantee that I will find B and even if I do, I will know where B is. Which defeats the objective.

Well I was speaking relatively, obviously it is minuscule compared to the traffic ebay.com generates. They could never know for sure they could just suspect.

How many .tor hidden service sites are out there? What is the average amount of traffic they generate? If Silkroad is the most popular by far…

If you’re a .tor hidden service, please speak up so you can be counted.

Yeah, I think it was one of the linked articles in it that reminded me of the current Cecil column…

Yeah it’s hard to say how much traffic there would be but I guess it would so small it would be hard to track given what I have read there was $22m in sales last year. That’s nothing in the scheme of things really…

How about a different analogy? You have a group of people with some people looking for information and some people offering information. The people offering information don’t want to make it clear to everyone what they’re offering because it’s illegal information and the same with the people looking for information. So you write a message with a certain request for information and give it to a random person. Unless that person happens to be the person with the information, they will then give the message to a random other person, and so on. Eventually the message makes its way to the person with the information, so they write a response and give it back to the person who handed it to them, as does everyone else until it gets back to you. No one who passes the message has anyway of knowing if the person they received it from is actually the person who wrote the message or just another person passing it along, and they can’t know if the person they give it to is the intended recipient or just someone that’s going to pass it further along. Still, you asked for certain information and it came back from the person who was supposed to respond, even though you don’t know who that person is and he doesn’t know who you are.