Merkle DAGs | Lesson 6 of 8

Merkle DAGs: Distributability

Merkle DAGs inherit the distributability of CIDs. Using content-addressing for DAGs has several interesting consequences for their distribution. The first, of course, is that anybody who has a DAG is capable of acting as a provider for that DAG. The second is that when we’re retrieving data encoded as DAG, like a directory of files, we can leverage this fact to retrieve all of a node’s children in parallel, potentially from a number of different providers! The third is that file servers are not limited to centralized data centers, giving our data greater reach. Finally, because each node in a DAG has its own CID, the DAG it represents can be shared and retrieved independently of any DAG it is itself embedded in.

Case Study: Distributing Large Datasets

As an example, consider the distribution of a large, popular, scientific dataset. Today, on the location-addressed web:

  • The researcher sharing the file is responsible for maintaining the file server and its associated costs
  • The same server is likely used to respond to requests all over the world
  • The data itself may be distributed monolithically, as a single file archive
  • It’s hard to locate alternative providers of the same data
  • Data is likely in large chunks that must be downloaded in serial from a single provider
  • It’s hard for others to share datasets that build on the original data

Merkle DAGs help us alleviate all of these problems. By distributing the dataset as a content-addressed DAG:

  • Anybody who wants can help distribute the file
  • Nodes from all over the world can participate in serving the data
  • Each part of the DAG has its own CID that can be distributed independently
  • It’s easy to find alternative providers of the same data
  • The nodes forming the DAG are small, and can be downloaded in parallel from many different providers
  • Larger datasets encompassing the original can simply link the original dataset as a child of a larger DAG

All of this works to promote scalable and redundant access to this important data.

Take the quiz!

Which of the following statements about datasets distributed as Merkle DAGs is FALSE?

Oops, you haven't selected the right answer yet!