ProtoSchool

Interactive tutorials on decentralized web protocols

Anatomy of a CID |
Lesson 2 of 6

Occasionally, a hashing algorithm may be proven to be insecure, meaning it no longer complies with the characteristics that we defined earlier. This has already happened with `sha1`

. With time, other algorithms may prove to be insufficient for content addressing in IPFS and other distributed information systems. For this reason, and in order to support multiple cryptographic algorithms, **we need to be able to know which algorithm was used to generate the hash** of specific content.

So how can we do this?
To support multiple hashing algorithms, we use **multihash**.

A **multihash** is a self-describing hash which itself contains metadata that describes both its length and what cryptographic algorithm generated it. Multiformats CIDs are future-proof because they use multihash to
**support multiple hashing algorithms** rather than relying on a specific one.

Multihashes follow the `TLV`

pattern (`type-length-value`

). Essentially, the "original hash" is prefixed with the `type`

of hashing algorithm applied and the `length`

of the hash.

`type`

: identifier of the**cryptographic algorithm**used to generate the hash (e.g. the identifier of`sha2-256`

would be`18`

-`0x12`

in hexadecimal) - see the multicodec table for all the identifiers`length`

: the actual**length**of the hash (using`sha2-256`

it would be`256`

bits, which equates to 32 bytes)`value`

: the actual**hash value**

In order to represent a CID as a compact string instead of plain binary (a series of `1`

s and `0`

s), we can use **base encoding**. When IPFS was first created, it used `base58btc`

encoding to create CIDs that looked like this:

`QmY7Yh4UquoXHLPFo2XbhXkhBvFoPwmQUSa92pxnxjQuPU`

Multihash formatting and `base58btc`

encoding enabled this first version of the CID, now referred to as Version 0 (`CIDv0`

), and its initial `Qm...`

characters remain easy to spot.

However, with time, doubts arose about whether this multihash format would be sufficient:

- How do we know what method was used to encode the data?
- How do we know what method was used to create the string representation of the CID? Will we always be using
`base58btc`

?

To address these concerns, an evolution to the next version of a CID was necessary. In the following lessons we'll explore what was added to the specification to lead us to the current CID version: `CIDv1`

.

Oops, you haven't selected the right answer yet!

**Feeling stuck?** We'd love to hear what's confusing so we can improve this lesson. Please share your questions and feedback.