27 September 2011

Trackers and DHT, Torrents and Hashes Trees

I have been spending quite some time reading on P2P file transfers and networks such as BitTorrent so that I can better understand the technology and engineering details on the backend. One of the aspects I am interested on is on how to use magnet links and have torrentless file transfers.

It is interesting to notice that this change from the original bit torrent design to a trackerless and torrentless one conforms to a paradigm change from a distributing tool to a distributed content network.

As stated by bittorrent.org:

BitTorrent gives you the same freedom to publish previously enjoyed by only a select few with special equipment and lots of money.
In this sense a publisher can run a tracker to keep a list of consumers downloading a file. By sharing that list between the consumers they can co-operate using the bandwidth between them to share and download the file faster. In order to ensure final users get the original content and no bad parts of content are given by a peer, a validation mechanism is required. Together with the address of the trackers this can be composed into a .torrent file.

Trackerless (actually: distributed public tracker)

Then it evolved with enhancements such as PEX (Peer-Exchange: which allows peers to exchange information on others peers also downloading the same file) that reduce the dependency on the trackers. Additionally the concepts of DHT (Distributed Hash Table) became in use to implement a distributed public tracker.

This key-step is important in removing the trackers as a single point of failure, allowing content to persist as long users with the file exist and keep announcing on the public tracker (DHT). As an important note: DHTs are vulnerable to Sybil Attacks: due to the low cost of creating DHT nodes a single machine can fake several DHT nodes and disrupt the normal functioning of the DHT.

Torrentless (Improved validation schemes: Hash Trees)

The most basic way to verify if a file is the requested one is to compare its hash with the one published. Due to the infeasibility to produce a file with the same hash this can often be used as validation scheme but at the cost that files can only be verified after all content is available, which is unacceptable for P2P where peers change parts of files and malicious peers can exist. For that reason techniques such as Hash Lists have been used: on the .torrent the publisher includes a list of the hashes of all pieces that compose the content. Being the reason for torrent files to have such an increased size and the impossible to describe the full hash list in a single URL refereeing to the content.

An approach to solve it was implementing a Metadata Exchange mechanism, that allows peers to download a .torrent from each other. It does looks an ugly hack to me, specially when compared to the alternative of using an Hash/Merkle tree. In a Merkle tree approach each part of a file is extended with hashs of the uncle nodes that allow to verify the authenticity of each part (please note that being able to forge a part of a file is as hard as creating a fake part with the same hash, which means that it is as secure as the hash list approach).

Summary

Besides the Sybil attacks on DHT, Trackerless and a torrentless distributed content networks are possible and bittorrent seems to be converting to it. But, somehow, most torrent community seems to be oriented into using the publishing/distributing-tool paradigm. I guess it maybe related to the fact most traffic on P2P networks is Movie-release oriented and most of the time associated with activities considered illegal by an interest in removing the freedom to share.

At the moment I wonder how DHT's can be extended and improved to provide a better distributed tool and why are persons having troubles to pick the torrentless way. Additionaly I hope that P2P clients start to take a pro-active role and start converting old torrent files into torrentless by providing the users the tools to recalculate an hash tree and announce themselves as peers for that key!

No comments:

Post a Comment