Occasionally, a hashing algorithm may be proven to be insecure, meaning it no longer complies with the characteristics that we defined earlier. This has already happened with
sha1. With time, other algorithms may prove to be insufficient for content addressing in IPFS and other distributed information systems. For this reason, and in order to support multiple cryptographic algorithms, we need to be able to know which algorithm was used to generate the hash of specific content.
So how can we do this? To support multiple hashing algorithms, we use multihash.
A multihash is a self-describing hash which itself contains metadata that describes both its length and what cryptographic algorithm generated it. Multiformats CIDs are future-proof because they use multihash to support multiple hashing algorithms rather than relying on a specific one.
Multihashes follow the
TLV pattern (
type-length-value). Essentially, the "original hash" is prefixed with the
type of hashing algorithm applied and the
length of the hash.
type: identifier of the cryptographic algorithm used to generate the hash (e.g. the identifier of
0x12in hexadecimal) - see the multicodec table for all the identifiers
length: the actual length of the hash (using
sha2-256it would be
256bits, which equates to 32 bytes)
value: the actual hash value
In order to represent a CID as a compact string instead of plain binary (a series of
0s), we can use base encoding. When IPFS was first created, it used
base58btc encoding to create CIDs that looked like this:
Multihash formatting and
base58btc encoding enabled this first version of the CID, now referred to as Version 0 (
CIDv0), and its initial
Qm... characters remain easy to spot.
However, with time, doubts arose about whether this multihash format would be sufficient:
To address these concerns, an evolution to the next version of a CID was necessary. In the following lessons we'll explore what was added to the specification to lead us to the current CID version: