ProtoSchool

ProtoSchool

Interactive tutorials on decentralized web protocols

Data Structures | Lesson 4 of 5

Cryptographic hashing and Content Identifiers (CIDs)

So far we've exclusively been discussing adorable images for the sake of our general merriment, but content addressing can be used on all different types of files and data, from JSON objects to term papers to videos. For cryptographic hashing to work, we need to know what data format we're working with and use an appropriate tool.

Decoding data structures

A CID (Content Identifier) is a particular form of content addressing used on the decentralized web. It was developed for IPFS (a decentralized web protocol which we'll discuss in later tutorials), but has very broad implications.

A CID is a single identifier that contains both a cryptographic hash and a codec, which holds information about how to interpret that data. Codecs encode and decode data in certain formats.

+-------+------------------------------+
| Codec | Multihash                    |
+-------+------------------------------+

Many formats and protocols use content addressing already. Tools like Git and protocols like Ethereum and Bitcoin are among them, but they differ in how to interpret the data and in what cryptographic function they use for hashing. CID allows us to create a universal identifier for any of these systems.

Every CID is an identifier that contains the codec to interpret the data and a multihash which is a self-describing hash (a hash that tells you what type of hashing function was used to create it).

+------------------------------+
| Codec                        |
+------------------------------+
|                              |
| Multihash                    |
| +----------+---------------+ |
| |Hash Type | Hash Value    | |
| +----------+---------------+ |
|                              |
+------------------------------+

For lots more detail on how CIDs are constructed in IPFS, check out our Anatomy of a CID tutorial.

Linking between different data structures

CIDs allow us to build data structures that link to other data structures in completely different formats. Imagine a tree of JSON objects that link to BSON objects that also link to git commits. (Or imagine a directory containing puppy images and kitty videos, with a subdirectory containing articles on giraffes. The possiblities are endless!) All the way down this tree we have cryptographic hashes that allow us to distribute and link the data.

Why is it important to link between different data structures? Every day on the centralized web, we link from text to images, from logos to homepages, and from emails to PDFs. Links tie resources together, convey meaning, and make the web awesomely interactive!