How Does BitTorrent work?

I haven’t been able to find a site that explains how BitTorrent works in layman’s terms, so I’ll ask here.

How is BitTorrent different from Gnutella? Why isn’t there some type of search function built in? Why does it require .torrent files? What does making a .torrent actually create?

Just to make it clear, I’m not asking how use BitTorrent or for any information that could promote illegal activity.

Gnutella is a peer-to-peer file-sharing system, where anybody can host and publish files. The primary function of the Gnutella network is to allow people to find each other’s files. When multiple sources offer the exact same file, it is possible to speed up the downloading process by downloading from several sources at the same time, but that load-sharing is not the primary purpose of the system.

Bittorrent is a more centralized system: you need a .torrent file which points your client to a central server, which not only contains the “master copy” of the file to be downloaded but also coordinates the clients. Distributing/finding the initial .torrent file falls outside the scope of the basic bittorrent protocol. The bittorrent protocol, however, is much more sophisticated when it comes to sharing the bandwidth required evenly over all the clients: as soon as you have downloaded a part of the file, your client tells this to the central server, which tells the other clients, and then you start sharing that data with the other clients in small chunks, thus reducing the load on the server and efficiently using the total upstream and downstream capacity of all the clients.

What this means is that Bittorrent is optimized for situations where a very large file suddenly becomes very popular and lots of people want to download it at the same time. The protocol is no more decentralized than it needs to be, and is decidedly not designed to facilitate anonymity of either the client or the server, nor does it contain any other features specifically intended for illegal purposes.

Bittorrent is very popular for distributing Linux installation CD’s, for example.

As I understand it:

  1. A person uses a BT client to create a .Torrent file, and places it on a computer with a fast connection.

  2. He places an ad for this file on one of many BT-oriented message boards or websites.

  3. You (I) download the file and open it in a BT client.

  4. The DL starts, and the host computer also supplies your computer with the IP addresses of others who are sharing this file. Almost immediately your computer starts participating sharing the file, uploading pieces to others in the “swarm”.

  5. There is no way to turn off the uploading feature, so everyone’s DL is faster because everyone is sharing. It is considered good manners to continue sharing this file in the swarm even after your DL has finished. When you have part of the file you are a “leecher”, when you have the whole file and are much more valuable to the swarm, you are a “seed”.
    Just a note: a lot of sound files shared through BT are in very big “non-lossy” formats. You might want to use a program to make smaller but good sounding mp3 files, but it’s bad form to share them as mp3s. My computer is in the shop, so I can’t look up the program that converts .shn (“shorten”) and .flac files, but just type “convert .shn to mp3” into Google, and you’ll find it. There are plugins for the different formats that let you convert between them. Winamp also has plugins for the non-lossy formats. That’s iTunes’ limitation as I see it. It only plays the more popular formats.

Oh, in case the answer to these questions was not sufficiently clear from my previous post:

The typical scenario for which Bittorrent was designed is when you want to download something for which there exists an official publisher and you already know where to find that publisher. For example, if you want a .torrent for the latest distribution of Red Hat Linux, you would obviously start looking at www.redhat.com. If you want the trailer for the latest Hollywood movie, just look at the website for that movie, etc. There’s no need for a separate search feature because Google will do the job just fine.

Of course, if you are looking for a CD or movie which the official distributor does not want you to download, then locating a .torrent could be a bit more difficult. But that’s not what Bittorrent was designed for.

Basically when you think of Kazaa or Gnutella, think “people sharing files with each other”. When you think of Bittorrent, think “publishing files in such a way that the file server does not get overloaded”.

Contrary to P2P systems like gnutella, with Bittorrent there is a clear difference between the server, which publishes the data, and the clients, which download it. The clients also send out data to other clients, which is how they take off some of the load from the server, but they only relay data from servers, they do not publish files themselves. In order to publish a file you need to set up a bittorrent server, just as you would need to set up a HTTP or FTP server for publishing files in the more conventional way. The .torrent file tells the clients where to find your server and how to connect to it.

I just found a wonderfully “layman’s terms” explanation of the protocol here:

I don’t believe that you are correct. That “central server” does NOT contain the actual shared file (indeed the disclaimers at many of the widely used BT sites say just that).

To follow up my last post…the link from Martin Wolf says…

You are right, I simplified the server side of the story a bit too much, because I wanted to emphasize how the initial design goals of Bittorrent were different from those of P2P networks such as Kazaa and Gnutella.

Publishing a file on Bittorrent consists of three parts:
[ol]
[li]The .torrent file, which can be distributed any which way;[/li][li]The ‘tracker’, which is a piece of server software at a static address;[/li][li]At least one ‘seed’, which is basically just a client which already contains the entire file.[/li][/ol]The .torrent points to the tracker, and the tracker knows where to find the initial seed. Originally, all three would typically be hosted by the same party and often even on the same server. As far as I know, this is still the case when BT is used for e.g. distributing Linux install CDs.

Nowadays there are also “bittorent communities” where somebody maintains a dedicated tracker server which all members can use, and where the server operator deliberately does not concern himself with the content being exchanged through that tracker. However, I fear that this thread is already rather close to the edge of what the Mods will allow, so I’m not going any further in that direction…

Bram Cohen, the creator of BT, has always maintained that his intent was to create a more efficient method for the legitimate publishing of large files by their publisher, not to facilitate piracy. Of course that’s what he would say in public anyway, but let’s be civil and allow him to remain innocent until proven guilty. Whatever illegitimate purposes the product might also be (ab)used for today, there can be no doubt that plenty of peoply are happily using it for being able to publish their own data without going broke on bandwidth costs.

Here are a few examples of copyright owners voluntarily publishing their own content through bittorrent:

http://fedora.redhat.com/download/#download
http://www.knopper.net/knoppix-mirrors/index-en.html
http://www.ubuntulinux.org/download/

These are all Linux installations, because those are the most obvious examples of very large files being made legally available. However, there’s no reason why BT could not also be used for things like movie trailers and game demos.

Given that Bittorrent’s creator is so emphatic that he does not want to facilitate piracy, I find the above "laymen’s terms explanation"a little disturbing. Although it’s a good analogy for the process, it involves a bunch of kids cheating at homework, with apparently no moral qualms!

I’m not sure that the “kids cheating on homework” example is from Bram himself, actually.

And I’m sure that all of the kids actually did the homework themselves, and only resorted to copying from each other because their dogs ate it… :smiley:

This seems pretty redundant. The .torrent file links to the tracker and the tracker links the seed. Why doesn’t the .torrent file link directly to the seed? Does the tracker start to link to new seeds as they become available so if the original goes offline it can still point to another copy? If not, what purpose does the tracker serve? Also, what constitutes a seed? If one were to download an entire file and keep the BitTorrent client open, would that become a new seed? Or does one need to download the file and then report to the tracker that a new copy of the seed has been posted at such-and-such a location?

I don’t think these questions infringe on this message board’s terms of service agreement. My original question’s been answered, but it raised more questions.

Yes. In fact, the tracker links to everyone who has any part of the file (as long as they are still connected to the tracker, at least). The purpose of the tracker is merely to coordinate the transfer of data between peers (or leechers) and seeds.

Yep. A seed is simply someone who has a complete copy of the file.

There’s no need to report anything like that to the tracker (at least not manually). The tracker’s job is to know who has which pieces of the file, and the protocol is designed to that the client constantly keeps the tracker informed of which pieces it has.

A seed is simply a computer connected to the tracker that has an entire copy of the file in question. As soon as you’ve downloaded all of the file, you become a seed, and remain so until you disconnect from the tracker.

It’s possible for a torrent to ‘die’, ie there is no seed, and no entire distributed copies. Then the torrent needs to be reseeded by someone who has the entire file.

Does the tracker also keep track of how fast a leech is uploading?
I know that one of the reasons BitTorrent is so successful is that people who are trying to cheat by limiting their upload speed get punished with a low download speed. I was wondering if the tracker is responsible for policing the leeches.

I may be assuming, but moderated file sharing is also why Direct Connect has risen in popularity a lot. Cheating is what destoys communism. :smiley:

_
_

I wish more places would use BT for legit downloads. It’s terribly convenient.

Most clients report their upload and download statistics to the the tracker. With these stats, some BitTorrent communites maintain a “share ratio” (upload/download) for each user. They usually restrict or ban users who have low ratios. Of course, since the original BT client and it’s derivatives are open source, some people have produced clients that report skewed stats or no stats at all.

I don’t know if there are any trackers that favor those with fast upload speed. Some clients, however, have an ability for “super seeding.” Super seeding works when there is only one seed. This seed gives out a piece of the file to each peer and then it sees how fast these pieces are spread to the rest of the swarm. The peers that propagate there pieces the fastest are assumed to have the fastest upload speeds. The seed will then give priority to these peers.

BTW, my favorite client is Azureus.

I did a talk on BitTorrent at my office last month… pretty much everything about it has been said here.
I’ll just mention that the basic protocol has the upload rate to any other node directly related to their download rate to you. IE, if someone gives you lots of data, you give them lots of data.

You can see how this differs greatly from your standard p2p community sharing mp3s and such. In that case, if someone is downloading from you, unless you download something from them (and you may hate their stuff, and so not bother), how can you tell if they’re giving back and helping the network or not? Since with bittorrent everyone wants the same file(s), leechers (in the standard use of the word), get really bad download rates.
If you want to read some whitepapers on BitTorrent (it’s only 6 pages or something, I recommend it), check out Bittorrent.com

By the way, I believe that the client doesn’t just download the files from people that have the entire file. It downloads the file in random segments from people that have the entire file as well as from people that have only partially downloaded parts. AFAIK, each person that has a partially downloaded file has different parts because it’s all downloaded randomly. So, when everyone that has the entire file quits sharing, no single person is supplying the entire file, but because the entire file is (at least partially) split around all of the people who are in the process of downloading they can still continue getting the parts they need. Bittorrent is faster because you aren’t only downloading from people that have the entire file, but also from people that have partial files. Freaking brilliant, right? Now I assume that the entire file isn’t always available, but this encourages everyone to keep their clients open longer.

One thing about bittorrent that is particularly of relevance to people with less powerful machines:

Because the file arrives at your computer in thousands of little packets from a whole bunch of different places, your computer then has to put all these bits back together again in order to assemble the complete file. This can be quite a drain on system resources, even for newer, more powerful systems.

Some bittorrent clients have a reputation of using up more computer processing power than others. Check some forums to find out which clients hog the fewest resources.