For password hashing, why is it not necessary to keep the salt secret?

In what seems to be a pretty good site about passwords and hashing, it is stated that the salt does not need to be kept secret (see the “adding salt” section, about 10 percent of the way down). Intuitively, I would have imagined that by keeping the salt secret, it would be more difficult to crack the password. I am trying to figure out why my intuition is wrong.

More specifically, if the salt is not kept secret, then, for a given user’s password, the corresponding hash table entry will consist of the (known) salt and the hashed password. That’s gotta be easier to crack than if the salt was also unknown to the hacker, no?

That makes me think that while keeping the salt secret would make it harder to crack, it would be superfluous, i.e. once the password’s been hashed using any good salt, it’s already uncrackable for all intents and purposes.

Does this make sense or have I totally misunderstood things?

Thanks!

First, note that if the salt were secret, there would be no way to validate the password. The validator needs to be able to hash the password exactly the same way it was hashed before storing it in the database. If the validator doesn’t have the salt, it can’t do that. So normally the salt and the hash are both stored and are equally accessible to the validator and to an attacker.

Apart from that practical issue, you’re correct that hiding the salt would be superfluous. If the hash function is cryptographically secure, it’s not any easier to crack the hash if the salt is known. That is, if the attacker knows the salt S, and wants to find the password P given H(P+S), he still has to attack the hash. If it’s easier to find P knowing S, then the hash function is no good in the first place.

–Mark

Thanks for your quick and helpful response.

Just to be clear, though, when I said “secret” I meant kept separately as well as encypted. So, for example, I wondered whether when a user is trying to log in, it might be more secure if the salt used for that user was called up from a separate and presumably encrypted file.

In that case, how would the validator be able to decrypt the secret salt file? It would need to keep the decryption key for the file somewhere, so how would you prevent an attacker from finding and using that key to decrypt the file? Why not just encrypt the password database? One of the main advantages of using hashed passwords rather than just encrypting the password database is to avoid this problem. The validator doesn’t have any secrets that an attacker can discover.

–Mark

The idea of the salt, or least one of the ideas, was to make a mass attack harder.

Suppose all passwords were hashed according to the same key, i.e., there is only ONE salt (which would be equivalent, effectively, to none at all). Now suppose you, wearing your hacker hat, try to do a dictionary attack, or any other similar systematic attack.

Now suppose also that you got a copy of the entire password file (which was NOT kept secret at all in earlier Unix systems, for example).

Here’s the hard, inefficient way to go: Start with the first hashed password in the file, and try to crack it. Try “aardvark”. Nope? Try “barnabas”. Nope? Try “charlie”. Nope? Eventually, you might go on to try the second password in the file.

Here’s the efficient mass attack method: This is what the salt is meant to thwart: Try “aadrvark”. Hash it up once, then compare that to every hashed password in the file, and see if you get any hit. Then try “barnabas”. Hash it up once, then compare that to every hashed password in the file and see if you get any hit. Then try “charlie”. Hash it up once, then compare that to every hashed password in the file and see if you get any hit.

Thus, for the effort of hashing up every prospective password once, you get to attack all the passwords in the file. In any user has chosen “charlie” as his password, you will find it.

Comes now the salt: Suppose, for example, that it’s a 24-bit salt. That means every password that a user could choose can get hashed in any one of 2[sup]24[/sup] ways, depending on which salt was randomly chosen.

So your hacking algorithm could work on the first password in the file, and hash up a bunch of prospective passwords according to the salt there, looking for prospective hits. But for each test password you hashed up, you could not then compare that hash with all the other passwords in the file to look for hits there. The differing salts in every password record prevented that. You had to do all the work all over again for each record in the file.

The password hash is a ONE WAY HASH.

This means that knowing the hash will not give you the password.

The reason for a salt is so that the same software could be used on two different servers, which may share the same users ( eg gmail.com and hotmail.com) so that the hashing gives a different hash… even with the same software.
That way, the owner (or hacker) of server 1, upon getting the hashed passwords and actual passwords from server 2, can’t match them up…

It doesn’t really have anything to do with different servers or software. The purpose of the salt is so that two users with the same password, even on the same server, have different hashes. This makes it useless for an attacker to build a database matching passwords to hashes.

–Mark

The salt is used for two main purposes.

  1. defeat rainbow tables. A rainbow table is a table of common passwords and their corresponding hash. You can look up the hash value that you know to get the password. With salt you need a different table for each salt value.

  2. The same password does not result in the same hash entry in the table. That way there is no back door to passwords. Say you know one person’s password through other attacks. If you find matches in the password hash table you now know the password for another account.

The knowing the salt for every password does not change either of these things.

Ah, so not only would it be redundant, it would needlessly complicate matters. Thank you.

ETA: To the others who have responded, yes, I understand the why of the salt. It was it’s need/desirability for secrecy or not that confused me.

Even with salts, there is a slight possibility of attacking multiple passwords for the price of one, but it would be limited:

Start with the first password in the file. Then, scan the file for any other hashed passwords that just happen to have the same sale. There just might possibly be a few. If so, you could attack those as a group. Hash up “aardvark” once according to the first sale. Then compare that to all the hashed passwords in the file that have the same salt. Continue with other prospective passwords. So it’s possible you could do a dictionary or systematic attack on a few passwords at a time.

Note also that password hashing algorithms are intentionally complex, not necessarily with the intention of making them hard to crack (they are designed to be hard to crack in any case), but also to make every hashing take a significant amount of CPU time to compute. Here, a “significant” amount of time is only a fraction of a second. But fractions of a second can add up if you’re doing a lot of them

The times when a password is legitimately hashed should be “relatively” rare – it happens once when a user sets his password, and later each time the user logs in or otherwise needs to authenticate himself. These are relatively rare events. Now multiply this by the total number of users, and it still doesn’t amount to a big drag on CPU time.

Contrast with the hacker: He is hashing up one prospective password after another, non-stop, for very long periods of time (hours or days or weeks or months or years?) trying one password after another. NOW the CPU usage is going to really add up! That fraction of a second to hash each password is going to slow down any hacker trying to do this. And then, multiply that by the numbers of users in the system (that is, by the number of entries in the password file), since the salting prevents the hacker from attacking all records for the price of just one. Now the time required REALLY adds up!

You can do better than this as well. There is a data structure called a rainbow table that allows storing (more or less) lots and lots of passwords with corresponding hashes in a file that’s much smaller than a straight list of passwords and their hashes.

Computing a rainbow table is as expensive, but as you say, you only have to do it once–or rather, you let someone else do it. Anyone can download a large rainbow table with quadrillions of passwords, and only takes a few hundred gigabytes.

The salt ruins it, of course. Quadrillions is a small number compared to 2[sup]64[/sup], and there’s no reason to have a salt that small.

Yes, that I get. Makes a lot of sense.

To hijack my own thread, what does a hash function actually look like? I assume it involves modular arithmetic but what makes it take so long to compute?

Note that posts #5, 6, 7, and 8 (ETA: and 11) are all saying substantially the same thing: That, with salts, a successful hack on any password does not get you into any other account (whether on the same machine or others) that happens to use the same password.

My point was, that hacking even one password can take a long long time, and with salts, even all that time can only work towards attacking one password and not them all.

That, I never learned. Not being a hacker, I never cared. I’m sure that someone more versed in the subject will have an answer. I was just a lowly Unix sysadmin.

ETA: Oh, and as to the OP’s question: Why not keep the salt secret? First, as already discussed, it can’t be secret, since it’s needed to re-hash the same password when the user tries to log in, to compare with the original. But further, as discussed in numerous posts above, the scheme is designed so that it’s not very necessary to keep the salts secret, since even with salts being public, they still make attacks impractical. If, further, salts could be kept secret, which they can’t, I suppose it would be even so much the better.

There are lots and lots of different hash functions, but modular arithmetic is indeed a common theme.

Here’s a very simple hash (djb2):
init: h := 5381
repeat: h := (h * 33) xor data

It’s fast, and not very good, particularly for cryptography. But you can see that it has the effect of moderately scrambling the input. A good property of cryptographic hashes is that it should be impossible to come up with an input text that gives you a particular output hash. That is, if you know that h(A) = B, it should be impossible to come up with a C such that h(C) = B (any faster than brute-force, that is). The djb2 hash fails at this; most likely you could figure out how to do that by hand. Other, better hashes are actually able to accomplish this.

Mostly, you want your hashes to be fast. But sometimes, for security reasons, you may want to deliberately slow down your hash. In these cases, one technique is to repeat your hash–i.e., compute h(h(h(…h(A))…)) = B.

D’oh! Of course! I should have figured that one out.

Thanks!

And furthermore, to be sure, modern systems DO try to keep the salts (in fact, the entire hashed passwords, salts included) secure.

In older Unix systems, the entire password file was publicly readable. It includes a variety of other fields that are of general usefulness.

In all more recent Unix and Linux systems, and certainly all other systems, the passwords and their salts are kept in a separate file from all the other publicly readable fields of a user’s account. This separate file is kept more secure – it is not publicly readable, but only readable and alterable by trusted system-level components of the operating system that are supposed to have access to that.

Or so you hope, at least. Somehow, hackers sometimes manage to hack anyway. Just ask Ashley Madison.

See the Wikipedia article “Cryptographic Hash Function” for starters. Designing a good hash function is very very hard. Just repeating a simple hash as was suggested above is a terrible idea – usually cracking a hash iterated N times is NOT N times as hard as cracking the original hash. (In fact the article that the OP linked to recommends against this.) MD5 was a pretty good hash for a while but very clever people found ways to crack it. SHA-1 replaced it, but it too is showing signs of weakness now. SHA-2 (at least SHA-256) is now recommended but I would bet that eventually it too will be broken and replaced with something else.

–Mark

Hash codes like Dr. Strangelove suggests, or anything similar, have another important use: To validate that a data file, transmitted from anyplace to anyplace else, has been correctly received ( probably ). Just hash up an entire data file like that, and send the resulting hash along with the file. The recipient hashes up the received file, and if the hash matches, then the entire data is probably correct. It’s certainly possible for two very different files to hash up the same (I’ve seen it happen with a simple file-checksumming program that I wrote), or maybe even for two slightly different files to hash up the same. Shit can happen.

An anecdote:

Unix has always included a program called sum to do this. (I’m talking about the 1970’s, 1980’s era, long before modern hashing and cryptography methods were developed.) I was sysadmin for a bunch of in-house machines in a smallish hi-tech company, 1984-1987. I ran into some kind of networking problem that I spent too much time trying to diagnose: I was sending a file from one machine to another, and the source and destination checksums simply would not match. I tried it over and over, and always got the same mis-matching checksums every time!

I finally figured it out: We had a mix of Bell Labs Unix and (mostly) Berkeley Unix machines. Their users’ manuals described the sum program, but didn’t document the actual algorithm used. It turned out, the Berkeley version was simply a different checksum algorithm than the Bell Labs version. For whatever reason, the wise people (grad students, mostly) at Berkeley had decided to re-write it, and not mention it to anybody.

That’s why I wrote my own super-simple checksum program (along similar lines to what Dr. Strangelove wrote above), which worked and gave the same results on Bell Labs Unix, Berkeley Unix, DEC Systems Unix, and much later, on M$ DOS machines, Windows machines, and modern Linux machines. The algorithm is way too simple-minded to be secure for password or any other security usage, but it’s reasonably reliable for verifying file transfers, modulo 2[sup]12[/sup] (IIRC).

I didn’t suggest that (my example was just a hint of what some hashes look like). Repeating a hash requires starting with an already strong hash. It can’t make a weak hash stronger. Nevertheless, it is a valid technique known as key stretching.