• The post is an explanation of the Chord protocol and discussion of my implementation of the same in Golang.

Chord is one of the original Distributed Hash Table projects from the MIT PDOS group at Computer Science and AI Laboratory, MIT. Here is a link to the original research for your reading pleasure. I was introduced to Chord by the book Distributed Systems by Maarten van Steen and Andrew S. Tanenbaum (p. 246, p. 248) as a solution for implementing a decentralized naming system or a structured decentralized overlay network.

Whats a Distributed Hash Table?

  • A Hash Table is a data structure which stores values associated with keys and allows direct access to the values using these associated keys. A Hash Table is called a Dictionary in Python and Map in Golang.

A distributed hash table, a DHT is a hash table distributed across various nodes in the network. The system allows any node that is part of the network to lookup location (i.e IP Address, in most cases) of the node containing the value of the key (not the value of the key itself).

PS. A node is just a system running the DHT program that is part of the DHT network.

How Chord creates a DHT?

Chord was introduced in 2001 as one of the four original DHT protocols (CAN, Tapestry, Pastry, and Chord). This post will only focus on Chord (mostly because I haven’t read about the other protocols, yet).

Chord suggests creating a Ring shaped overlay network in which nodes are arranged based on a unique numeric ID computed on the node. This ID is generated by using a Hash function which computes an integer from the passed in bytestream, in this case, the hostname of the node. Every node in the system has an integer value associated with it which can be compared with each other, thus the Chord ring can be created by arranging them in increasing order of value of this ID. This ID stays consistent throughout the lifetime of the system unless the network interface is changed.

Let there be 4 nodes: n1, n2, n3, n4 with ID’s 100, 200, 300, 400 respectively and arrange them in a Ring overlay. n2 is the successor of n1, n3 is the successor of n2, and so on.

    n1 ---------- n2 
  (100)          (200)
    |              |
    |              |
    |              |
    n4 ---------- n3
  (400)          (300)

The Hash function is the SHA1 Hash of a bytestream and from this 160bit hash, a fixed amount of bits are extracted to create a numeric value, in our case it is 64bits to create a 64-bit unsigned integer.

    // SHA1 hash of the data and extract a 64 bit (8 bytes) integer out of it.
    func Hash(data []byte) {
        ID := SHA1(data)[:8]
    }

A key in the DHT whose location is to be resolved is hashed using the same hashing function as described above. Let the Key be any bytestream, example: Key = “UserData”. The function Hash(Key) returns a numerical value which will lie between one of the nodes in the Chord ring.

        Hash(Key) = 125
           ↓
    n1 ---------- n2 
  (100)          (200)
    |              |
    |              |
    |              |
    n4 ---------- n3
  (400)          (300)

The most important aim of the system is to implement the function FindSuccessor(HashedKey) to lookup the location of a key in the DHT. If we execute FindSuccessor(Hash(Key)) on any of the nodes that are part of the network, It will return the node’s location who’s ID’s numerical value lies just next to the value of the hashed key value. In our case FindSuccessor(Hash(Key)) will return the location of node n2.

    Hash(Key) FindSuccessor(Hash(Key))
        ↓         ↓
    n1 ---------- n2 
  (100)          (200)
    |              |
    |              |
    |              |
    n4 ---------- n3
  (400)          (300)

Functions and Structures

The paper provides pseudocode for the many functions required to implement the DHT, my implementation also follows the pseudocode as presented in the paper. The concept of a Finger table and Successor table are used to improve the efficiency of the protocol.

To increase the number of keys which converge after lookup on a single physical node, multiple virtual nodes can be deployed at once. Each virtual node can be contacted independently by other nodes in the network. Every node in the network stores a reference to its successor and predecessor and supports the FindSuccessor function to lookup key locations.

FindSuccessor

The paper’s pseudocode algorithm of FindSuccessor looks at the current node’s successor’s ID if it is the successor to the key to be looked up. If it is not, A finger table which stores references to different nodes in the network is used to find the node whose ID precedes the key, thus this node’s successor is the successor of the key. This node is then asked to look up the key.

n.find_successor(id)
  if (id is between n and successor, inclusive of successor)
    return successor;
  else
    n0 = closest_preceding_node(id);
  return n0.find_successor(id);

A naive approach could be to keep asking the successor of the node to look up the key, which will trigger it to ask its own successor and so on. This is a rather inefficient solution, the finger table resolves this problem by creating a list of nodes in the network according to the formula for generating an ID: (n + 2i) mod 2m, where m is the total nodes that can be in the finger table and i goes on till m.

The paper provides proof that using the finger table every lookup request, with high probability, can be resolved by contacting O(log N) nodes, where N is the number of nodes in the network.

n.closest_preceding_node(id)
  for i = m downto 1
    if (finger[i] 2 (n; id))
    return finger[i];
  return n;

A node in the Chord ring contains a reference to its successor, predecessor and a finger table which is a list of different nodes in the network.

Fixing Fingers

The finger table is created and updated over the lifetime of the node. The paper suggests the method FixFingers to periodically update fingers in the finger table. This periodical update interval to update the finger table as suggested by the paper is 15 seconds.

On every execution of FixFingers, one entry in the finger table is updated.

n.fix_fingers()
  next = next + 1;
  if (next > m)
    next = 1;
  finger[next] = n.find_successor(n + 2^(next - 1) );

Creating a Network

If a node wants to create a Chord ring and be the first node in the network. It can set its predecessor to be nil and keep the reference of the successor to be itself.

n.create()
  predecessor = nil;
  successor = n;

Joining a Network

If a node wants to join a Chord ring it needs to know the network location of at least one other node in the network and contact it by asking this node to resolve the successor for itself.

n.join(n1)
  predecessor = nil
  successor = n1.find_successor(n)

Dynamic Operations and Failures

The network has to be dynamically updated to let nodes join the network at will and stale nodes be removed from the network and stabilize the ring again. The paper suggests a stabilization routine to regularly verify that the current successor is still its successor and no new node has joined in between them in the ring.

The Stabilize routine takes the ID of the predecessor node of its successor (Each node stores the reference of its predecessor and successor) and verifies it to be itself. If it is not, it is verified if this new node lies in between itself and successor which makes it the new successor of the node replacing the previously believed successor. This new node is also notified of its new predecessor. Thus, stabilizing this 3 node arc of the Ring with a newly joined node and informing all the nodes of its existence.

n.stabilize()
  x = successor.predecessor
  if(x is in between n and successor)
    successor = x
  successor.notify(n)
n.notify(n1)
  if(predecessor is nil or n1 is in between predecessor and n)
    predecessor = n1

Another function, CheckPredecessor also runs periodically to verify the liveness of the predecessor. The time interval for CheckPredecessor is suggested to be 15 seconds by the paper.

n.check_predecessor()
  if(predecessor has failed)
    predecessor = nil

Implementation in Golang

Source Code on GitHub

As explained above, a Chord Ring (or, network) consists of nodes constantly updating their finger table, stabilizing their position in the ring to let new nodes join, verifying the liveness of their predecessor, and looking up the location of keys using the finger tables. Thus creating a dynamically updating and decentralized network for looking up keys i.e a Distributed Hash Table.

I have implemented Chord in Golang, recreating the pseudocode in the paper. The concept of a virtual node is implemented in the form of independent threads running on different ports on the same system, this enables every local node to be accessed via the network. The nodes are created in a manner which allows Local Nodes to be directly contacted by Direct Method calls rather than unnecessarily sending RPC’s over the network. RPC is implemented using Golang’s internal gob encoding based RPC mechanism.

To enable the intermingling of local and remote nodes, a common interface VNodeProtocol is defined which is implemented by two different types of VNodes: LocalVNode and RemoteVNode. While the LocalVNode contains the logic for the Chord protocol, the RemoteVNode are just RPC Wrappers to contact LocalVNodes running on remote systems. VNodeProtocol allows the protocol to transparently handle both Remotely and Locally available nodes. Locally running worker threads contact each other using Direct Method calls, and calling the same methods via RPC for RemoteVNodes.

      ---> LocalVNode: Local Implementation of a VNode.
      |                Contains implementation of the Chord Protocol
      |
VNode.VNodeProtcol ---> Generic Interface to the Chord Protocol.
      |
      |
      ---> RemoteVNode: RPC Backed VNode which communicates to the RPC Server
                        (defined in `rpcserver.go`) which calls the remote
                        processes' LocalVNode to fulfil the RPC.

The finger and successor tables are lists of VNodeProtocol objects, A hostname is wrapped as a RemoteVNode object to join a Chord ring, Each individual VNode on the machine is a LocalVNode in its own goroutine running the stabilization, finger fixing, and predecessor verification routines along with running an RPC server in the background.

// VNodeProtocol implements the Chord protocol on Vnodes.
// Local VNodes can implement it via method calls.
// Remote VNodes can use RPC to transparently work like Local VNodes.
type VNodeProtocol interface {
	// FindSuccessors finds N successors of the VNode.
	FindSuccessors(int) ([]VNodeProtocol, error)

	// FindSuccessor finds the successor for a Key.
	FindSuccessor(uint64) (VNodeProtocol, error)

	// Notify notifies the VNode of its new predecessor.
	Notify(VNodeProtocol) error

	// Ping sends a request to a VNode
	Ping() error

	// CheckPredecessor checks the aliveness of VNode's predecessor.
	CheckPredecessor() error

	// GetPredecessor returns the predecessor VNode.
	GetPredecessor() (VNodeProtocol, error)

	// IsBetweenNodes
	IsBetweenNodes(VNodeProtocol, VNodeProtocol) bool

	// ID returns the ID of the VNode.
	ID() uint64

	// Hostname returns the hostname of the VNode.
	Hostname() string
}

Implementing periodical routines.

The periodically running routines are each initialized as their own goroutine when the LocalVNode is first initialized along with their respective time intervals. An example of the Stabilization routine is given below. Where the Stabilize function is executed in a time interval between [node.minStabilizeInterval, node.maxStabilizeInterval].

// Stabilize executes after certain time intervals to fix successors.
// > Each time node n runs Stabilize(), it asks its successor
// > for the successor’s predecessor p, and decides whether p
// > should be n’s successor instead. stabilize() notifies node
// > n’s successor of n’s existence, giving the successor the chance
// > to change its predecessor to n.
func (node *LocalVNode) Stabilize() error {
	logger.Printf("[%s, %d] Stabilizing VNode\n", node.Hostname(), node.ID())

	verifySuccesorNode, _ := node.successors[0].GetPredecessor()
	if verifySuccesorNode != nil && verifySuccesorNode.IsBetweenNodes(node, node.successors[0]) {
		node.successors[0] = verifySuccesorNode
		logger.Printf("[%s, %d] Updated successor: %s\n", node.Hostname(), node.ID(), verifySuccesorNode.Hostname())
	}

	if node.successors[0].ID() != node.ID() {
		err := node.successors[0].Notify(node)
		logger.Printf("[%s, %d] Notified %s of VNode.\n", node.Hostname(), node.ID(), node.successors[0].Hostname())
		return err
	}

	return nil
}

// StabilizeRoutine runs Stabilize() periodically by choosing an interval
// between minStabilizeInterval and maxStabilizeInterval.
func (node *LocalVNode) StabilizeRoutine() error {
	exit := false

	for {
		node.Stabilize()

		interval := Util.GetRandomBetween(node.minStabilizeInterval, node.maxStabilizeInterval)
		timer := time.NewTimer(time.Duration(interval) * time.Second)
		select {
		case <-timer.C:
		case <-node.stopStabilizeChan:
			exit = true
		}
		if exit {
			break
		}
	}
	return nil
}

// GetRandomBetween returns a random integer value between low and high.
func GetRandomBetween(low int, high int) int {
	return rand.Intn(high-low) + low
}

Hostnames and RPC

RPC is implemented using the net/rpc package which internally serializes structures and transmits them over the network. A TCP RPC Server is kept running by each LocalVNode for remote nodes to contact them.

The RPC Server can take in the hostnames for running the server in various forms. It depends on the input to net.Listen (as can be seen in the implementation of InitServer).

The user can pass in a hostname like “192.168.1.116:8000” which will bind port 8000 on the interface with the IP 192.168.1.116 on the physical system. If the user wants a random port to be allocated by the kernel itself, they can supply a hostname address like “192.168.1.116:” binding a random port as supplied by the kernel for the RPC server. This is reflected in the Hostname field of the LocalVNode and is transmitted to RemoteVNode in RPC arguments and results. Thus it is important to consider the hostname format. A hostname like “[::]:0” will bind a random port on 0.0.0.0 but will break the system when passed onto a remote VNode via RPC.

// InitServer starts Chord Protocol TCP-RPC server on Hostname:Port.
func InitServer(rpcInstance *ChordTCPRPCServer) error {
	rpc.Register(rpcInstance)
	l, e := net.Listen("tcp", rpcInstance.Hostname)
	if e != nil {
		return errors.New("failed to start Listen server")
	}

	// Reset address as acquired by Listener.
	address := l.Addr().String()
	rpcInstance.Hostname = address

	go func() {
		for {
			conn, err := l.Accept()
			if err != nil {
				continue
			}
			go rpc.ServeConn(conn)
		}
	}()

	return nil
}

HTTP Server For Key Lookup

A Single HTTP Server is hosted which uses one of the globally accessible worker nodes to look up a key passed in as form data in an HTTP POST request.

The /lookup endpoint returns the location of the node where the key will be stored.

$ curl -X POST localhost:8090/lookup -d "TestKey"
127.0.0.1:8000

Some examples

The project doesn’t have any external dependencies. I developed the project using Go 1.12 on Windows 10 1909.

  • Build the program.
  go build
  • Run a single worker thread on 127.0.0.1:8000.
  ./src -host 127.0.0.1:8000
  • Connect to a remote host.
  ./src -mode join -httpport 8091 -host 127.0.0.1:8001 -rhost 127.0.0.1:8000
  • Run 8 Local Worker Threads on Randomly Assigned Ports (0.0.0.0:0).
  ./src -workers 8

Conclusion

The Chord protocol is implemented, however, there is still polishing to be done for improving, configuration of the workers (possibly a YAML file to define them?). This is one of the firsts for me where I went from reading a paper to implementing it and many open source implementations helped me throughout this process: armon/go-chord, cbocovic/chord and yuma-m/pychord.

References