You are on page 1of 24

Understanding BitTorrent

Iqbal Mohomed

What is the Problem Being Solved Here?


Sharing a fairly large file Involves making a replica Problem is somewhat similar to, but not the same as, replication in a distributed file system, a Content Delivery Network or a Distributed Hash Table overlay network

Simple Solution: One Big Server


Make the file available on a central server Each client downloads file from this server Problems
Solution does not scale very well With a large number of clients, the servers resources get overwhelmed

The Brilliance of Napster: P2P


In the original Napster, nodes connected to a central server and gave it a listing of all the files they had. Nodes relay searches to the central server, which performs them locally The actual file transfer occurs peer-to-peer The big weakness of this approach was that the directory server was a single point of failure

The Gnutella Solution


In Gnutella, all nodes are true peers This gets rid of the single point of failure problem The new problem is efficiency and scalability
Specifically, searches go across a large number of nodes, generating a massive amount of traffic There is a larger compromise in privacy as peers see search queries

The FastTrack Network


aka Kazaa Combines the Napster and Gnutella approaches Nodes connect to super-peers that act as the directory server in Napster The super-peers connect to each other similar to Gnutella This solution is working very well in practice

Enter BitTorrent
Released in the summer of 2001 Uses basic ideas from game theory to largely eliminate the free-rider problem
All previous systems could not deal with this problem well

Makes no strong guarantees unlike DHTs It is working extremely well in practice, unlike DHTs

Basic Idea
Chop file into many pieces Replicate DIFFERENT pieces on different peers as soon as possible As soon as a peer has a complete piece, it can trade it with other peers Hopefully, we will be able to assemble the entire file at the end

Basic Components
Seed
Peer that has the entire file

Leacher
Peer that has an incomplete copy of the file

A Torrent file
Passive component Files are typically fragmented into 256KB pieces The torrent file lists SHA1 hashes of all the pieces to allow peers to verify integrity Typically hosted on a web server

A Tracker
Active component Allows peers to find each other Returns a random list of peers

Operation

Pieces and Sub-Pieces


A piece is broken into sub-pieces ... typically 16KB in size Policy: Until a piece is assembled, only download sub-pieces for that piece This policy lets complete pieces assemble quickly

Pipelining
When transferring data over TCP, it is critical to always have several requests pending at once, to avoid a delay between pieces being sent BitTorrent breaks pieces into sub-pieces At any point in time, some number, typically 5, are requested simultaneously Every time a sub-piece arrives, a new request is sent This scheme has been found to saturate most connections in practice

Piece Selection
The order in which pieces are selected by different peers is critical for good performance If a bad algorithm is used, we could end up in a situation where every peer has all the pieces that are currently available and none of the missing ones If the original seed is taken down, the file cannot be completely downloaded!

Random First Piece


Initially, a peer has nothing to trade Important to get a complete piece ASAP Rare pieces are typically available at fewer peers, so downloading a rare piece initially is not a good idea Policy: Select a random piece of the file and download it

Rarest Piece First


Policy: Determine the pieces that are most rare among your peers and download those first This ensures that the most common pieces are left till the end to download Rarest first also ensures that a large variety of pieces are downloaded from the seed

Endgame Mode
Policy: When all the sub-pieces that a peer doesnt have are actively being requested, these are requested from EVERY peer When the sub-piece arrives, the replicated requests are cancelled This ensures that a download doesnt get prevented from completion due to a single peer with a slow transfer rate Some bandwidth is wasted, but in practice, this is not too much

Choking
One of BitTorrents most powerful idea is the choking mechanism It ensures that nodes cooperate and eliminates the free-rider problem Cooperation involves uploaded sub-pieces that you have to your peer Choking is a temporary refusal to upload; downloading occurs as normal Connection is kept open so that setup costs are not borne again and again Based on game-theoretic concepts
Tit-for-tat strategy in Repeated Games

Prisoners Dilemma

Repeated Games
Over time, more complex strategies can evolve For instance, Tit-for-tat
Do onto others as they do onto you If someone cheats, you must retaliate back Have a recovery mechanism to ensure eventual cooperation

Choking Algorithm
Goal is to have several bidirectional connections running continuously Upload to peers who have uploaded to you recently Unutilized connections are uploaded to on a trial basis to see if better transfer rates could be found using them

Choking Specifics
A peer always unchokes a fixed number of its peers (default of 4) Decision to choke/unchoke done based on current download rates, which is evaluated on a rolling 20-second average Evaluation on who to choke/unchoke is performed every 10 seconds
This prevents wastage of resources by rapidly choking/unchoking peers Supposedly enough for TCP to ramp up transfers to their full capacity

Which peer is the optimistic unchoke is rotated every 30 seconds

Anti-Snubbing
Policy: When over a minute has gone by without receiving a single sub-piece from a particular peer, do not upload to it except as an optimistic unchoke A peer might find itself being simultaneously choked by all its peers that it was just downloading from Download will lag until optimistic unchoke finds better peers Policy: If choked by everyone, increase the number of simultaneous optimistic unchokes to more than one

Upload-Only mode
Once download is complete, a peer has no download rates to use for comparison nor has any need to use them The question is, which nodes to upload to? Policy: Upload to those with the best upload rate. This ensures that pieces get replicated faster Also, peers that have good upload rates are probably not being served by others

References
"BitTorrent Economics Paper" , Bram Cohen
"BitTorrent protocol specification" , Bram Cohen

"BitTorrent Resource Availability Analysis" , Brian Greinke and James Hsia. (Rice)
"Dissecting BitTorrent: Five Months in a Torrent's Lifetime" , M. Izal, G. Urvoy-Keller, E.W. Biersack, P.A. Felber, A. Al Hamra, and L. Garc es-Erice. (Institut Eurecom, France)

You might also like