A hash function is an algorithm that transforms an input of arbitrary length into an output of fixed length.

An example of how this is achieved is through expansion – if we have a function operating on a block of data, we can call it multiple times, allowing us to compute the hash for a longer input. One popular construction is the Merkle–Damgård construction, which is the basis for many hash functions. An example of other solutions is the sponge construction of the SHA-3 function.

merkle-damgard

If the compression function is collision-free, then the entire algorithm will have this property.

The message is divided into smaller chunks of a fixed length. The last block, which is often shorter, is padded using a special string. One example of such a string might be 100000.....$L$, where $L$ represents the 64-bit length of the message before padding in big-endian format.

Then, using subsequent blocks of the message and the output from the previous compression (the H function in the diagram), which is the current state of the hash function, it is modified. For the first block, instead of the previous output, a constant initialization vector (marked as IV in the diagram) is used. After the last iteration, the current state is the output of the entire algorithm.

Different functions may have different target applications – a function used for indexing is probably an insufficiently secure choice for authentication purposes. Specific requirements and mandatory properties of a function are conditioned by its application, but there are several elements that generally characterize them.

What is a hash collision?

A hash collision occurs when two different inputs to the function (e.g., passwords) produce the same result. This can happen because there are infinitely many possible inputs to the function, but only a limited (though very large) number of possible outputs.

Hash function characteristics

It is irreversible (one-way)

Irreversibility (or one-wayness) means that computing the hash from a given input is straightforward, but given only the hash, it is computationally difficult to find the corresponding input.

Avalanche effect of the hash function

One of the characteristics is the large changes in the output hash even with small modifications to the input. Even slightly different messages should generate completely different hashes, with no discernible patterns. Every bit of the output should change with a probability close to 0.5. Because hashing is a process of mapping an infinite number of messages to a finite set of hashes, perfect hash functions do not exist. In practice, we use algorithms that have properties sufficiently close to those of ideal functions.

It is collision-free

This is the case for the collision-free property. A collision-free function is one for which there is no polynomial-time algorithm capable of finding a collision. This obviously does not mean that a collision does not exist; it simply means that it is computationally complex enough that we can consider the solution to be sufficiently secure.

A hash function has weak collision resistance if it is computationally difficult to find a collision for a known message. Strong collision resistance, on the other hand, means that it is computationally difficult to find any pair of messages that produce the same hash.

Previous Post Next Post