Blowfish as a hash algorithm

I’ve been in the middle of a debate over which hash algorithm to implement. While being a proponent of SHA256 myself, colleagues have proposed using JBCrypt, which is based on Blowfish.

Personally this is the first time I’ve ever heard of Blowfish being used for hashing, and though I can’t find any mention of the method amongst comparative lists of hash schemes, this white paper seems to claim that it’s secure, and it’s also used to hash passwords in OpenBSD.

Any crypto-savvy dopers who can enlighten this poor plebe?

What are the arguments in favor of JBCrypt? I am personally not familiar with it, but have used SHA-2 in various lengths for many things. SHA256 is a proven standard (though I use SHA512 when I can, why not?).

None, that’s why I can’t figure out why they’re implementing it =) It’s somebody’s personal preference, and I’m trying to figure out if I should talk them out of it or not.

Hashing as in randomly distribute, or hashing as in hide the original value?

I guess if the latter, you need some evidence that the output of BlowFish is evenly distributed. Take 10,000 words or a dictionary or whatever, and feed them through and see what distribution you get. The purpose of a distribution hash is to spread evenly. Does it do that with your proposed input data set?

I have zero experience with any of those algorithms, haven’t done hash since colege days. (says with straight face).

It’s cryptographic hashing, which has more requirements than a standard hash function. It needs to be mathematically proven that it’s impossible to reverse, and infeasible to find two values that produce the same hash. SHA and all the major hash functions have these requirements documented by respected sources (or have their faults exposed).

Are you specifically trying to hash a password database, or do you just need a crypto quality general purpose hash algorithm? It looks like bcrypt is specifically designed for password database, with countermeasures in place to deter things like rainbow tables and brute-force attacks.

You probably know about this already, but just in case you don’t, if you are trying to hash passwords with SHA256, make sure that the passwords are salted, and make sure you crank the password through the hash algorithm a very large number of times (probably in the thousands). These extra steps are necessary due to the fact that passwords are often trivial to guess, and it is very likely that multiple users would have the same password.

There’s no reason to crank it through thousands of times, unless you’re just trying to guard against a brute-force attack by making things take too long. In that case, you’ll probably just end up annoying your users by making them wait ten seconds to authenticate.

The case of multiple users having the same password is handled by using salt strings. If you want to handle the case of a raw dictionary attack by someone who has obtained the password file (and therefore the salts) then an additional, separately-stored secret string can be thrown in the mix. But that’s probably overkill for most things. A high-entropy salt string is enough to reduce the problem of cracking to a brute-force-from-null.

BTW, if you are going to use multiple passes through the hash, the correct way to do it is to reintroduce the salt string at every iteration.

IOW, hash( hash( hash( salt + password ) ) ) is wrong, and can introduce weaknesses in the random distribution of the final resulting hash. (Consider that for all but the inner hash, your inputs are limited to the subset of inputs that have the same length as the hash function’s output.)

So the recommended way is to use hash( salt + hash( salt + hash( salt + password ) ) ).

Yes, cranking through the password multiple times is intended to frustrate brute-force attack, and I did mention salted passwords in my response. This is standard practice on many Unix systems for their password database. I’ve never timed how long it takes to run SHA256 for a thousand times, but I didn’t think it would actually take ten seconds on a modern system. Tweak the number of iterations as needed so that it takes something like half a second to a second to authenticate, and it should have minimal impact on any legitimate users.

I found this peer reviewed SHA-based algorithm for password hashing. It mentions the Blowfish-based algorithm in its introduction, but states that it is not acceptable to some companies because it doesn’t use a NIST-approved algorithm.

Thanks for the answers. Indeed, the purpose is password encryption.

I guess that’s not an argument to get me colleagues to redo their implementation sadly, or at least it won’t convince them. Lucky me that I’m changing jobs in a week :wink:

Where are the proofs for the hash algorithms currently in use?

I still ask - what’s the hash for?

Because sometimes, a hash was used to distrbute records evenly in a database; ie. we have a sequence of serial numbers, and we want them evenly distributed randomly in the master table; so that, let’s say, we write a numeric sequence of serial number records they don’t all have to go in the same sector on disk. In a database, for example, you may fill up one page/sector and cause it to overflow, which is less efficient that distributing among a dozen different sectors - because a random lookup will pull up that page and then search sequentially through the overflow pages to find the required record. SO the question is not just irreversibility but randomness of distribution.

Well, it usually involves quite some deep mathematical analysis of the hash function itself, if I remember my university days correctly.

As said, cryptographic hashing of passwords in a database.