One way to store passwords provided by users is through the use of hash functions. In this case, the input provided, for example, during user registration, is processed by the chosen hash function, and its output is saved in the database. Hash functions, because they are one-way, ensure that it is not easy to retrieve the original data. They can be attacked, but this operation is not as easy as decryption, though still possible. This provides a certain level of security for developers and administrators, and does not result in direct leakage of passwords if the entire database is stolen. Passwords stored this way cannot be recalled, but the problem should be addressed by a password reset feature. Storing passwords as hashes is one of the safer ways to do so. Additionally, to further enhance security and increase the cost of attacks, additional techniques such as salt, pepper, and key stretching are employed. After such operations, the hash is effectively transformed into a PBKDF function key, which is currently the safest way to store information necessary for user authentication.

Salt

The first method for increasing the difficulty of attacks on a system storing hashes is the addition of salt. Salt is a randomly generated string of characters that is added to the user's password before the hash function is applied. It is stored in the database alongside the password itself. If an attacker gains access to the hashes, it will be significantly more difficult to guess the passwords hidden behind them due to the additional random, often complex, string. However, it should be assumed that since access to the hashes has been obtained, the salts can also be retrieved. In such a case, the attacker could add the salt to each attempted combination, which increases the difficulty of the attack. Nonetheless, even known salts provide good protection against situations where an attacker already has a list of precomputed hashes for a range of words (e.g., rainbow tables) and compares them to those from a data breach.

Pepper

Another variant of the added string is called pepper. This is a constant (which should also be complex) value added to all passwords intended for storage in the database. Its distinguishing feature is that it is stored outside the database. This significantly increases the cost of attacks, both by dictionary methods and rainbow tables. It is possible to "neutralize" it (by adding it to each tested string), but this requires additional effort and increases the overall difficulty. The pepper would need to be stolen from another, configuration-related part of the server. Just like with salt, even a known pepper increases the cost of certain hash-breaking methods.

Key Stretching

The third operation used to increase the difficulty of attacks on user passwords is key stretching. This technique extends the computations, for example, by forcing the hash function to run in a loop a specified number of times. The output from one iteration is then fed into the next.

In this case, for example, during registration and login, the loop needs to be run many times, but the user will not notice such a minor delay (e.g., a few dozen milliseconds). The time to break the hash will increase significantly (e.g., from hours to months). This is safe because it has been proven that repeated hashing with cryptographic hash functions does not increase the chances of a collision.

PBKDF and bcrypt

From the combination of the method of storing data based on a hash and its extensions, password-specific functions have been created. These are known as PBKDF (Password Based Key Derivation Function) functions. Their general operation is illustrated in the diagram below.

pbkdf

The function takes 2 or 3 parameters as input. The mandatory ones are the password and the computation cost (work factor). The second parameter controls the complexity of the computation, increasing the time required to generate the value. This makes attacks harder, but also slows down the generation and verification process. For example, in the bcrypt function, with the cost set to X, it will run 2^X iterations of the algorithm. For instance, X = 10 means 1024 iterations.

An optional parameter is salt. If it is not provided, it will be automatically generated using an internal pseudorandom number generator. This parameter is necessary during password verification.

The output of the function is a key (master key), which must be stored in the database. This key contains information about the algorithm used, the salt, and the computation cost. A sample key with its structure, based on the bcrypt function output, is shown in the diagram below.

bcrypt

The first four characters represent the version and algorithm type. The next 3 characters represent the computation cost, and the following 22 characters represent the salt used. The remaining part of the key is the hash of the password provided by the user. For bcrypt, the recommended cost is between 12 and 15.

You can determine the optimal cost yourself – you should set a specific function execution time (e.g., 100 milliseconds) and adjust the cost so that the function does not exceed this time on the available hardware. Setting a cost too high may lead to a situation where users cannot be served due to server resource exhaustion.

To verify if the provided password is correct, the function is executed using the key from the database. The type of algorithm, cost parameter, and salt are extracted from the key. A new key is then generated using the password provided by the user and the salt retrieved from the database. If the results match, the user is authenticated. It is important to note that bcrypt has a password length limit – depending on the implementation, up to 72 bytes. If a longer password is provided, it will simply be truncated to the maximum allowed length.

Other Algorithms

There are many algorithms based on similar principles. Some of the popular ones are:

bcrypt
PBKDF2
phpass
argon2
scrypt

They have different characteristics (e.g., the ability to parameterize memory requirements in scrypt and argon2) and internal methods of operation. Regardless of the chosen scheme, they are the safest form of storing secret information on the server side, and the choice and tuning of a specific algorithm depends on the specifics of the use case. It is important to match the selected solution properly and slow down the potential attacker as much as possible.

Further Reading

OWASP Password Storage Cheat Sheet

Previous Post Next Post