Bcrypt Cocktail and Hash Extraction

TL;DR - Combining bcrypt with other unsalted hash functions (such as MD5 or SHA-1) can lead to serious consequences in the form of "extraction" of bcrypt. However, it is still better than just using a weak algorithm. In some cases, it is possible to find something like a "collision".

Hash Extraction aka. Password Shucking

30ml Legacy Code

Companies that have been around for a while often have to adapt to new circumstances, guidelines, or operating patterns. This is also true in the case of password storage. Years ago, businesses handled this in various ways – they stored passwords without any protection, encrypted them, or used fast hashing functions like MD5 or SHA-1. However, when the winds of change blew and the desire or necessity to adapt their infrastructure to current standards emerged, a problem arose. How can we change all the password hashes for our users across the entire database to a newer algorithm? There are three approaches that can be highlighted.

First, force everyone to reset their password. This, however, comes with certain consequences – it’s neither convenient for us nor for the users. We need to handle a large, sudden, and immediate number of operations and risk being suspected of a data leak. Customers may want to stop using the platform.

Second, we can replace the hashes in the database upon login. This idea is not bad – no one will know, and from the outside, it won’t be visible, but how long will this take? Some users are likely to be our regular customers, but replacing all of them could take months, if not years, at best. What if we have a leak in the meantime? What about users who never log in? Public opinion won’t care that 15% of users were switched to bcrypt before the leak. It also requires supporting many login mechanisms at once.

Third – what about hashing the hash? Let’s take each user’s hash and compute a bcrypt from it – e.g., bcrypt(MD5($pass)). This mechanism is also invisible from the outside. While we have to support two algorithms simultaneously, we are secure right away since we don’t have MD5 in the database. But are we really?

90ml Data from Leaks

The problem that starts to emerge is password shucking. This method, which doesn’t have a well-known translation or description in Polish, we’ll refer to as hash extraction. To understand why it’s a problem, let’s take a look at one of the most popular services aggregating data about leaks – haveibeenpwned. Currently, it holds 12,485,202,808 passwords from 664 sites as of March 27, 2023. Let’s look at some of the most recent leaks reported by HIBP: TheGradCafe, Shopper+, Eye4Fraud, LBB, and iDTech – all of which were posted on Twitter by HIBP in March 2023.

Screenshot%20from%202023-03-27%2010-09-30

Screenshot%20from%202023-03-27%2010-10-22

Screenshot%20from%202023-03-27%2010-10-35

Screenshot%20from%202023-03-27%2010-10-46

Screenshot%20from%202023-03-27%2010-11-00

The overwhelming majority of the hashes from these and many other leaks reported by HIBP were already in their database. What does this mean? One could argue that at least some users are simply reusing the same password in multiple places.

Other leaks disclosed this year:

3

md51

md52

[center](Passwords stored in MD5 or even in plain text.) [/center]

So, what exactly is hash extraction? Let’s take a (slightly modified) example from Royce Williams, a member of the hashcat team. The hypothetical situation is as follows:

  1. The attacker obtains a database from a cryptocurrency exchange, where password hashes are stored in bcrypt.
  2. The attacker attempts simple, direct attacks on bcrypt – but unfortunately, they fail because bcrypt is very slow.
  3. The attacker constructs a dictionary for a bcrypt attack that contains MD5 hashes from another unrelated leak, for example, from a dating site. They check if any of the MD5 hashes are valid "passwords" for bcrypt. They may also search for the hash of a specific user if the attack is targeted at a particular person. If the attacker finds even one valid MD5 hash, they can draw the following conclusions:
    • First, directly cracking bcrypt hashes, where the base is an MD5 output, is unlikely.
    • Second, cracking MD5 from another leak, combined with the common practice of reusing passwords across sites, gives the attacker a decent chance of success.

Cracking MD5 in practice is much faster than bcrypt. A graphics card like the RTX4080 Founders Edition can crack MD5 at a speed of around 98,000 MH/s, or ninety-eight billion hashes per second. Bcrypt? 131 kH/s – or one hundred thirty-one thousand hashes per second. There is a difference.

This is what password shucking is – extracting the weaker algorithm from a stronger wrapper. This kind of "wrapping" also happens in other situations, such as when someone (somewhat ineptly) wants to bypass bcrypt's maximum password length limitation or believes that it will "strengthen" easy passwords. This isn’t limited to just MD5; any unsalted algorithm wrapped in bcrypt can fall prey to this. The hashcat tool also supports directly cracking unsalted algorithms wrapped in bcrypt:

  • 25600 - bcrypt(md5($pass))
  • 25800 - bcrypt(sha1($pass))
  • 30600 - bcrypt(sha256($pass))
  • 28400 - bcrypt(sha512($pass))

This is not a theoretical attack – Royce acknowledges that:

This is not a theoretical attack. It is used all the time by advanced password crackers to successfully crack bcrypt hashes that would otherwise be totally out of reach for the attacker.

2 Good Pieces of Advice

So, how should we do this right?

  1. Use a pepper, which is a separately stored, long and random value that is added to the weaker algorithm (e.g., bcrypt(md5($pass).$pepper)), so that other leaks don’t give an advantage in breaking the wrapped hashes. However, one needs to keep an eye on it – losing the pepper means returning to square one.
  2. Combine the ideas of wrapping current hashes and replacing them with "standard" bcrypt upon user login.

Shaken, Not Stirred – Bcrypt Cocktail Served

Another problem we might encounter when combining bcrypt with other functions is how it has been implemented. Let’s look at PHP and an entry from Anthony Ferrara (@ircmaxwell) describing bcrypt implementation. It all comes down to one sentence:

Basically, it ignores everything after the first null byte.

Is this a problem? Well, using a null byte in a password certainly isn’t a popular practice among users. Does this cause any real issue?

disperse-lesley-nielsen

When we combine outputs from functions like MD5, SHA, or HMAC-SHA with bcrypt, it’s a big issue. In the raw output from these functions, the null byte is a standard character. What’s more, Anthony provides statistics showing that 1 in 256 (~0.39%) of "pre-hashes" generated using HMAC-SHA256 will have a null byte as the first character. Let’s take another of his examples, where we generate two "passwords" using HMAC-SHA256, which are different, yet their verification as the same bcrypt will be valid (thanks to the first null byte):

$key = "cwuioshc8934f89ch398h34hdfhd3d3d4d343d"; //losowo wybrany klucz, wybrany poprzez uderzenie czołem w klawiaturę
$hash_function = "sha256";
$i = 0;
$found = [];

while (count($found) < 2) {
    $pw = base64_encode(str_repeat($i, 5));
    $hash = hash_hmac($hash_function, $pw, $key, true); //tworzenie HMAC-SHA256 na podstawie klucza oraz licznika pętli
    if ($hash[0] === "\0") {
        $found[] = $pw;
    }
    $i++;
}

var_dump($i, $found);
$hash = password_hash(hash_hmac("sha256", $found[0], $key, true), PASSWORD_BCRYPT);
var_dump(password_verify(hash_hmac("sha256", $found[1], $key, true), $hash));

The result is as follows:

result

Two different outputs from HMAC-SHA256 produce a matching bcrypt result. The problem is not only the null byte at the first position but at any position. This isn’t even just a PHP-specific issue – the crypt(3) dependency in C has the same "feature". This isn’t even a limitation specific to bcrypt but affects various solutions in the crypt() family. The bcrypt example is particularly useful because bcrypt is a very popular solution, widely available both directly as password_hash() in PHP and in frameworks (e.g., Hash::make in Laravel, which also uses password_hash() under the hood).

How not to get mixed up? The best approach is simply to use standard bcrypt. However, if we are forced to use such a construction, avoid using the direct output from the "pre-hash" (the last true parameter in the hash_hmac function – this is how cryptographic functions are typically combined, using the raw output, not the encoded one), and instead, process it further – for example, use the output in the form of a hexadecimal string or, for instance, with base64: password_hash(base64_encode(hash_hmac("sha512", $password, $key, true)), PASSWORD_BCRYPT).

Source:

https://www.youtube.com/watch?v=OQD3qDYMyYQ

https://twitter.com/haveibeenpwned

https://security.stackexchange.com/questions/234794/is-bcryptstrtolowerhexmd5pass-ok-for-storing-passwords

https://superuser.com/questions/1561434/how-do-i-crack-a-double-encrypted-hash/1561612#1561612

https://www.scottbrady91.com/authentication/beware-of-password-shucking

https://gist.github.com/bigpick/cfa22947c884f7a3fc1431475e345427

https://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html+

https://tenor.com/view/disperse-lesley-nielsen-explosion-gif-13010485

https://github.com/illuminate/hashing/blob/master/BcryptHasher.php

Previous Post Next Post