The choices we make when structuring our data have important consequences. Consider the example of a photo library, each image depicting different people, places, and events over the course of several years. We could structure this data in many ways—we could even elect to not structure it at all!—and each choice would have significant implications for our interactions with the library.
Structure can serve as an index for data, affecting the speed with which we can locate and retrieve specific pictures in our photo library. Structure can add semantics to data as well, by enabling us to group related objects.
pics ├── cats │ ├── 2018-02-23-tabby.png │ └── 2019-12-16-black.png └── fish ├── 2017-03-05-freshwater.png ├── 2018-04-14-tropical.png └── 2020-10-02-blowfish.png
The directory listing above gives us a small, concrete example to work with. In this example, we have a root directory, called "pics", that holds our entire photo collection. Within this directory, we’ve grouped our photos into two sub-directories, "cats" and "fish", to separate photos based on their subject. Within these sub-directories, we’ve organized the photos by the date they were taken, which is reflected in their filenames. The structure here helps describe the individual files, letting us know, for example, that "2018-04-14-tropical" is in some way related to the terms "fish" and "pics", and that "pics" is a more general description of the collection that this individual file is contained within. The structure also helps us identify commonalities in our data: "2018-02-23-tabby.png" and "2019-12-16-black.png" are related in their common classification.
There is no single best way to structure one’s data; every choice comes with significant tradeoffs. The larger our dataset becomes, the more important it is to tailor its structure to the way we intend to use and access it. In our example, photos are organized according to animal type, which makes looking for pictures of specific animals very easy. However, trying to find the picture file with the earliest timestamp amongst all the files is comparatively more difficult, requiring us to look through all of the directories. If our directory consisted of thousands of photos, it would become extremely tedious. If we instead organized their hierarchy according to the date each picture was taken, the situation would be reversed.
Structure gives our data meaning and organization, while CIDs let us reference data securely, verifiably, and without coordination in a distributed network. In the following lessons, we’ll see how to build content-addressable data structures that give us the power of both!
Feeling stuck? We'd love to hear what's confusing so we can improve this lesson. Please share your questions and feedback.