Indeed, I understood that you were simply providing an example of how one can devise a relatively simple way to lengthen hash computation time. You didn’t claim it generated secure hashes.
There have been known cases where repeated hashing actually converges to some result that is totally redundant, meaningless, and useless.
I read once, somewhere, about one of the first attempts to write a random-number generator, devised (I think) by one of the really early luminaries in the Computer Science field. (We’re talking 1950’s era here, or maybe even 1940’s. Old stuff.) To devise the algorithm, he basically wrote a sort of stream-of-consciousness series of mathematical steps to apply to each random number, to produce the subsequent random number.
Intuitively, it looked random-ish. That is, from reading the sequence of steps, you could not by any means intuitively guess what sequence of numbers you would get, starting with any particular seed. Such was the state of research into random numbers at the time.
Much to everyone’s surprise, the first time they tried to generate a sequence of random numbers with this algorithm, it quickly converged (to zero, I think). The algorithm involved, in certain steps, taking only certain digits of the previous number (something akin to the old mid-squares method), and at some point, a number turned up with a bunch of zero digits in the critical place, and then the sequence was all zero from that point on. Oops.
Let me clarify a bit more. Let’s suppose we have three hashes: a, b, and c. And for conciseness, I’ll call the attack that I mentioned above (where if you have h(A) = B, you can find h© = B) a preimage attack.
a is a weak hash. It is not preimage resistant. It doesn’t matter how many times I repeat a; there’s nothing I can do to make it better. It’s junk and only useful as a toy.
b is a strong hash–it’s preimage resistant. It’s also really slow. So it actually makes a great secure hash! The preimage resistance means that attackers have to brute-force everything, and the slowness means brute-forcing takes a long time. No problems here.
c is also preimage resistant. But it’s super-fast; a million times faster than b! Although attackers have to brute-force things, they can do so quickly because c is so fast. In practice, it’s not as strong as b.
However, we can make c slower by repeating it a million times. Now, it’s just as good as b was for security purposes. You haven’t made it stronger, exactly, but you have made it provably slower. It’s still preimage resistant and repeating it doesn’t change this.
c is then the superior hash because when we don’t need this kind of security (such as in the instances that Sensgoid mentioned), it’s fast. And when we do need the slowness, we can just repeat it and be confident that we aren’t losing strength.
Yeah–probably inevitable if your hash isn’t very good.
In fact, many hashes are probably prone to this to a very minor degree. It’s likely that any hash function will have an input that hashes to itself (~63% probability for a good hash); a “fixed point”. So if you repeat a hash enough times, eventually it will jump around until it hits the fixed point. But that will require zillions of iterations; way more than you would ever use in practice. So for good hash functions, it’s not a worry. Plus its fixable by adding an iteration-dependent salt.
No. You’re making claims without proof, some of which are demonstrably wrong.
First, repeating a hash N times does not necessarily make it N times slower for an attacker. Your own example of a simple hash refutes this. A hash that is a linear combination of the input, iterated a million times, is still a linear combination of the input and can be computed in the same amount of time as the original hash. Real hashes are more complicated of course but you would need to mathematically prove that a given hash which takes T amount of time, iterated N times can’t be computed in less than N*T. This in general would be very difficult to show, especially since it’s probably false for most hashes. For some hashes, iterating it may make it EASIER to find collisions. You can’t know whether this is true or not without very difficult analysis.
Generally it’s not known whether any hashes are preimage resistant, so your assumption that we have a hash that is known to be preimage resistant is not realistic. MD5 was thought to be good, until it was shown that collisions could be found. SHA-1 was thought to be good, until similar unexpected attacks were found. Creating your own home brew hash function, even if it’s built on existing well-analyzed functions is a bad idea. You have no idea what the characteristics of your new function are, and neither does anyone else, until either an attack, or a mathematical proof of its security is found.
Are you referring to the simple file check-sum program I wrote? I used that for years and years, at several different companies I worked for, and at home. Every so often, I’d encounter two very distinct files that would produce the same check-sum.
Eventually, after using the program for many years, I discovered a case (actually, a whole class of cases) in which the program was NOT preimage resistant, as defined by you. (Is that a word you invented, or a standard usage in the biz?) That is, given any file and its check-sum, I could create indefinitely many other files that would have the same check-sum. Not a problem, really, for the intended usage, but still… Oops.
It turned out, that if you picked any 8-bit character, and inserted a block of 8 (or maybe 16, I forget) of those bytes in a row anywhere in the file, the resulting file would have the same check-sum as the original. Of course, you could always tell it was corrupted because the file size would be different.
But it just occurred to me (just as I was contemplating writing this post), that for some files, one could construct another file of the same length with the same check-sum. I never tried this (having just thought of it): From the above, it should follow that: IF you can find 8 consecutive bytes all alike anywhere in the file, it should be possible to change those to 8 consecutive of any other byte value, and the check-sum would stay the same.
Which is why I explicitly said that this doesn’t apply to weak hashes. You’re refuting claims that I never made, and in fact already denied.
All you’re saying is that if an attacker can break a hash, then an attacker can break a hash. Hashes and encryption algorithms always exist in a state of quasi security. They are treated as secure until they are known to be insecure. Maybe SHA-2 is trivially crackable. In that case, no amount of repetition can save it. But until that day comes, we treat it as being secure. And if it’s secure, then (careful) repetition is one way of slowing down attackers.
For the record, I never recommended homebrew hash functions and never would. Always use known standard when possible. For key stretching, PBKDF2 is one such standard. Guess what! It uses hash repetition to increase the difficulty.
It’s standard. See here.
I looked up the hash used by the sum command and see that it only produces a 16-bit output! So it’s no surprise that you would see collisions. Furthermore, it just adds up the bytes in the file, so ABC will hash to the same value as CBA.
My point really is that you weren’t deliberately trying to sabotage your own files. You didn’t have to worry about someone substituting in a carefully-crafted message designed to foil your checksum and fool you. It was just for your own peace of mind.
A common use of hashes in computer science is in a data structure called (appropriately enough) a hash table. Suppose you have a bunch of files, and a new file arrives and you want to check if you’ve seen it before. Instead of comparing the entire file against everything in your database, you just compare a small hash value. If the hash doesn’t exist yet, then you know you haven’t seen the file yet. If it does match, then you have to check manually, but you only have to check the files that matched (a small number).
A preimage attack doesn’t do much since we do the full check either way. So most hash tables use simple, fast hash functions. Worst-case is that an attacker could slow down your hash table by filling it with files that all hash to the same value, forcing you to test against everything.
I used your trivial hash as an example where iteration does not provide increased security. Hashes aren’t clearly divided into “strong” and “weak” hashes. But SHA-2, for example, is generally considered strong. Does iterating it make it more resistant to collisions, or does it make it less resistant? I don’t know, and neither do you. If someone were to use iterated SHA-2 in an application that was worth someone’s time to attack, perhaps we’d find out. But there’s nothing inherent in “strong” hashes that guarantees that iteration improves security.
Not at all. I’m refuting the idea that there’s no way to compute the result of an iterated hash other than by iterating the original hash. It’s false for your simple linear hash, and it may be false for stronger hashes.
Again, you’re just assuming that iteration makes it stronger. There’s no justification for that. Perhaps there’s a clever approach that makes it faster to compute SHA2(SHA2(x)) than it is to compute SHA2(x). Perhaps there’s even an approach that makes it easier to find collisions with the iterated version. It would take detailed analysis to know, you can’t just assume this.
That (collisions) wasn’t the problem with the Unix sum program. The problem was that Bell Labs and Berkeley used two different sum algorithms, so the same file produced different checksums when I sent it from one machine to the other. If the SYSV sum was that simple-minded, no wonder the Berkeley people felt the need to re-write it, but gee y’think they could have mentioned the fact in their manual pages? :smack:
The collisions I mentioned came up with my own checksum program, which I was motivated to write when I discovered the Bell/Berkeley divergence. My program has produced consistent results on multiple different systems on which I’ve used it, for about 30 years now. But that was the program in which I found the weirdness about inserting blocks of 8 all-alike bytes.
It used an algorithm a bit more sophisticated than just adding the bytes. It started the hash with 0, then for each byte: Multiple hash by three and add the new byte. Then, clip off any bits that overflow 12 bits, shift that overflow right 12 bits (down to the units position) and add it there. Result: a 12 bit checksum.
Anyone interested in seeing or using the source code? I think it’s still a suitable program for verifying any kind of file transfers, at least for reasonably casual use. It’s a fairly trivially small (and fast) program, written in archaic C with really old-style functions headers like this:
main( argc, argv ) int argc; char *argv;
but I hope all modern compilers can still grok that.
Here it is: cksm.c source code. If you compile it and use it to checksum the source code itself, you should get: Size = 2700, cksm = 1885.
Some specs: Can take any number of file names on the command line, but does NOT handle any form of wild-cards. On any Unix-like system, you can still give wild-cards, as the shell expands them first. But that won’t work on any DOS system or its derivatives. If no command-line argument is given, it reads one file from standard input.
To hijack my own thread again, let me ask:
Is password (and other) security even theoretically possible if quantum computation comes to practical fruition?
That would be possible only if a preimage attack were possible. If it were possible to “follow the bits” throughout the hash such that you could collapse the computation somehow, then you would also have a preimage attack vector.
Of course we don’t know for sure that a preimage attack is impossible. In fact, it almost certainly is possible. But until we know of one, repetition is safe.
That said, there are ways to save small amounts of computation on repeated hashing. This paper shows a way to improve performance by 10-20%. But it’s basically just saving a bit of overhead at the beginning and end of each iteration, exacerbated by the fact that it’s hashing short inputs. This kind of attack doesn’t scale to even a factor of 2.
Heisenberg proved that you can know your username or your password, but never both at once.
Wow, and I thought he only made good crystal meth.
Most likely. A working, fast quantum computer breaks public-key encryption–the “factor a large number into two primes” problem that you’ve probably heard of. But private-key encryption and hashing don’t yet have known quantum algorithms.
You’ve trapped yourself into an over-exageration there, which you clearly did not mean. Having two users with the same password on the same server is the rare case. Having the same password on an attack server with the same software is the common case.
Iterating a hash can never make it more resistant to collisions, and will almost always make it less resistant to collisions. If H(A) = H(B), then H(H(A)) = H(H(B)), and in general H[sup]n/sup = H[sup]n/sup. It is theoretically possible to construct a hash algorithm which does not become less resistant to collisions with iteration, but such an algorithm is likely to have other properties which aren’t desirable for cryptographic purposes. In fact, I can think of a number of “hashing functions” which are guaranteed to have that property, and which would be guaranteed terrible for cryptographic purposes.
That said, however, decreasing collisions isn’t the purpose of repeating the hash function, and a small increase in collisions is tolerable, if the number of collisions was sufficiently low to begin with. The purpose of repeating the hash is just to make it slower to calculate (which also might or might not work, depending on the hash used, but at least it can work).
Yeah. The only way you can avoid the property is if you can prove that the hash function is bijective (across inputs the length of the output). And one way of proving bijectivity is coming up with the inverse function. Since the whole point of a cryptographic hash is that an inverse function doesn’t exist, bijectivity is not really a desirable property.
As you say, the actual increase in collisions is going to be extremely small, and not a problem if you have a strong hash function to begin with.
Given what we know about the public’s proclivity for passwords like “Password1” and “letmein”, I think “Having two users with the same password on the same server is the rare case.” may be less true than a naïve statistical argument might suggest.