ProtoSchool

ProtoSchool

Interactive tutorials on decentralized web protocols

Data Structures | Lesson 5 of 5

Merkle trees and directed acyclic graphs (DAG)

As we've discussed, the decentralized web depends on linked data structures. Let's explore what those look like.

Merkle trees

A Merkle tree (or simple "hash tree") is a data structure in which every node is hashed.

               +--------+
               |        |
     +---------+  root  +---------+
     |         |        |         |
     |         +----+---+         |
     |              |             |
+----v-----+  +-----v----+  +-----v----+
|          |  |          |  |          |
|  node A  |  |  node B  |  |  node C  |
|          |  |          |  |          |
+----------+  +-----+----+  +-----+----+
                    |             |
              +-----v----+  +-----v----+
              |          |  |          |
              |  node D  |  |  node E  +-------+
              |          |  |          |       |
              +----------+  +-----+----+       |
                                  |            |
                            +-----v----+  +----v-----+
                            |          |  |          |
                            |  node F  |  |  node G  |
                            |          |  |          |
                            +----------+  +----------+

In a Merkle tree, nodes point to other nodes by their content addresses (hashes). (Remember, when we run data through a cryptographic hash, we get back a "hash" or "content address" that we can think of as a link, so a Merkle tree is a collection of linked nodes.)

As previously discussed, all content addresses are unique to the data they represent. In the graph above, node E contains a reference to the hash for node F and node G. This means that the content address (hash) of node E is unique to a node containing those addresses.

Getting lost? Let's imagine this as a set of directories, or file folders. If we run directory E through our hashing algorithm while it contains subdirectories F and G, the content-derived hash we get back will include references to those two directories. If we remove directory G, it's like Grace removing that whisker from her kitten photo. Directory E doesn't have the same contents anymore, so it gets a new hash.

As the tree above is built, the final content address (hash) of the root node is unique to a tree that contains every node all the way down this tree. If the data in any node were to change even by a single byte, the hash of the changed node would change, as would the hashes of all of its parent nodes.

In case you haven't noticed, this means that as a programmer you'll always need to build these data structures backwards, from the leaf nodes on up to the root node.

Directed Acyclic Graphs (DAG)

Directed Acyclic Graphs

DAG is an acronym for Directed Acyclic Graph. It's a fancy way of describing a specific kind of Merkle tree (hash tree) where different branches in the tree can point at other branches in the tree in a single forward direction, as illustrated by the image above.