Kademlia (The Protocol)

Kademlia describes a protocol for implementing a [Distributed Hash Table](https://en.wikipedia.org/wiki/Distributed\_hash\_table) (DHT) and is described in [these](http://www.scs.stanford.edu/~dm/home/papers/kpos.pdf) two [papers](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf). It is a widely used and widely extended protocol that serves to enable or improve many other systems, so it's worth taking some time to dive into it.

# Some Background

There are many DHTs that came before this, and there will likely be many which come after. I want to first describe [Chord](https://en.wikipedia.org/wiki/Chord\_(peer-to-peer\)) as background to show why this was a useful idea, and to build some intuitions for how DHTs work.

## Chord

Chord is a precursor to Kademlia. What it does is picture its network as a circle.

Each node in this circle is assigned an ID by some agreed-upon process. All nodes are required to hold a connection with the node before and after it.

To make a request, whether that is to get or set an entry, you pass that request from one node to the next along that circle until you arrive at the node "nearest" to that entry's key by some agreed-upon method. To take an example, let's look at a network that supports a max of 8 nodes and uses the last 3 bits of MD5<sup id=pointer-1>[1](#footnote-1)</sup> to decide who owns what. All of these distances are done by "Price is Right" rules, regardless of method. Distances are *only* ever counted going clockwise around the circle.

![What this toy model might look like](./img/chord-example-1.png)

This model is missing a lot, though. For instance, this setup would yield you $O(n)$ lookup time, which is not at all ideal. Chord then imposes one further rule: each node must keep a "finger table" where each node is at most the next power of two away. So if you're node 0, you have a connection to the nodes closest to 1, 2, and 4.

![What this toy model might look like from the perspective of Node 0](./img/chord-example-2.png)

Note that in the above image, the green arrow corresponds to a connection required by Node 6. All the other connections are required by Node 0.

Doing it this way you end up with an $O(log\_2(n))$ lookup time. The worst case is if you are requesting something owned by your direct predesessor. Node 0 forwards the request to Node 4, who forwards to Node 6, who forwards to Node 7.

There are some additional rules for adding nodes, but much of it just follows from trying to hold the above properties, so I won't talk about it here for either Chord or Kademlia.

## BitTorrent

Most BitTorrent clients [use Kademlia](https://en.wikipedia.org/wiki/BitTorrent#Distributed\_trackers) to find peers, starting since about 2005.

## Ethereum

The Ethereum network [uses Kademlia](https://github.com/ethereum/wiki/wiki/Kademlia-Peer-Selection) for peer selection, rather than supporting its whole system. This is useful because it allows you to make assumptions in performance analysis that you would not be able to make in an unstructured network, and because it provides some assurances about Denial of Service resistance.

# The Main Ideas

Kademlia largely brought together innovations present in other projects. Exponential hopping came from Chord, and XOR distance somewhat comes from Pastry.<sup id=pointer-2>[2](#footnote-2)</sup> The main benefits they got came from putting a much larger emphasis on parallelism and from making a logical leap that the Pastry authors did not.

## XOR Distance

At first glance, Kademlia looks a lot like Chord. All nodes are assigned an ID using an agreed method, all requests are routed towards nodes that are closer to the target.

The biggest difference here is that instead of using a circular distance metric, Kademlia uses the XOR function. This seems like a strange decision at first, or even an invalid one, but XOR is a perfectly valid distance metric. In fact, it has more useful properties than Chord's circular one.

A general distance metric must satisfy the following four properties:

1. $d(x, x) = 0$
2. $d(x, y) > 0$ where $x \ne y$
3. $d(x, y) = d(y, x)$
4. It satisfies the triangle inequality, so $d(x, z) \le d(x, y) + d(y, z)$

The Chord metric violates rule 3 (in our example $d(1, 2) = 1$ but $d(2, 1) = 7$), whereas XOR satisfies all of them.

## Exponential Hopping

Using XOR as a metric allows you to have exponential hopping in a much more intuitive way. This part is where Kademlia looks most like Pastry, using a prefix-based routing scheme.

All Kademlia nodes maintain a routing table organized into "k-buckets" of $\le k$ nodes where their IDs share the first $x$ bits of your ID, or alternately, where the first bit they agree on is the $x$<sup>th</sup>. When trying to decide on a node to send your request to, you pick from the group with the same prefix as the key you're requesting. This leads to exponential hopping in a way that is both more flexible than and feels less artificial than Chord's structure.

## Emphasis on Parallelism and Replication

Because Kademlia's routing table is organized into groups, they are able to use this productively. Many implementations essentially say "screw trying to find the optimal node, let's send the request to everyone in that bucket." Indeed, this is a strategy actually pointed out by the paper.

> Kademlia [...] can send a query to any node within an interval, allowing it to select routes based on latency or even send parallel, asynchronous queries to equally appropriate nodes.

They even give a system parameter, $\alpha$, to describe how many nodes get sent non-STORE messages.

# The Protocol

Kademlia consists entirely of 4 RPCs. All of them share some base properties. For instance, all of these messages share the property that if the sender doesn't receive a reply then that is noted in the routing table. If a node misses too many it will get replaced when another suitable peer is found.

## PING

This probes a node to see if it's online. Fairly self explanatory.

## FIND\_NODE

This RPC is the heart of the Kademlia protocol. It takes a node ID as an argument, and the node which receives it will send back information for the $k$ closest nodes it knows of. For a well-saturated node, this will usually come from a single bucket in its routing table. If a node doesn't know $k$ nodes yet, it sends as many as it does know of.

Using this, Kademlia nodes can implement what they call a *node lookup*. The gist of it is:

1. Pick the $\alpha$ nodes closest to your target
2. Send each of them a FIND\_NODE for your target
3. If you receive a reply that includes your target, break
4. If this round doesn't get you closer to your target, break
5. Otherwise, go to 1

## FIND\_KEY

Retrieving a key is very similar to finding a node. When receiving a FIND\_KEY message, Kademlia nodes may respond in one of two ways: If they don't know of $k$ closer nodes to the target or they have that key in their local cache, they simply return the answer. Otherwise, they respond exactly as if they had gotten a FIND\_NODE message, and the peer will use this to send further FIND\_KEY requests.

Kademlia calls out one additional step that I want to highlight verbatim:

> For caching purposes, once a lookup succeeds, the requesting node stores the (key, value) pair at the closest node it observed to the key that did not return the value.

## STORE

To store a value, one first looks up the closest $k$ nodes to that key. Once they are found, send a STORE message to each of them with the (key, value) pair. It is important that you get the closest $k$ nodes, because all others will drop it from their local storage at a rate inversely proportional to their distance to the key.

Those nodes that deem themselves responsible for a key will republish it as needed to keep it alive. Usually this includes the one who set it and the $k$ closest nodes, but there could be other interested parties as well.

# Why it works

Structurally, this works for a lot of the same reasons Chord does. It can guarantee logarithmic lookup times with its hopping mechanism. In fact, it can do even better because it implements this in parallel.

Persistance of keys relies on some assumptions, though, so I want to outline those here. The only way for a key to die would be if the $k$ closest nodes to all die or leave the network with no lookups or republishing between them, and for this value to not have been cached in any other node on the search path. Thus, the assumption is that this value $k$ is somewhat tuned to how many nodes you expect to leave in the longest expected refresh periods.

The authors include in the longer version of the paper some analysis they did on Gnutella networks to justify their belief that this is a reasonable assumption. The gist of it is that the longer any particular node has been online, the more likely they will be online an hour from now. They also use this to provide some Denial of Service resistance by requiring that nodes have a preference for older peers over younger ones when selecting peers for their routing tables.

# My Thoughts

This protocol is super interesting to me, especially considering how small it is. It seems to offer a *lot* of room for extension into more general-use protocols, which I'll be trying to do this summer. In the next week, expect an update on an implementaion of Kademlia that I've been working on.

## Footnotes

<span id=footnote-1>1: Yes, this is bad practice, but it's a toy model. [Back](#pointer-1)</span>

<span id=footnote-2>2: The most major difference being that Pastry uses two routing functions, falling back to numeric distance for last-mile deliveries. [Back](#pointer-2)</span>

tags: research-review, research-progress, hobby-horse, retrion