Secret Codes - aka One-Way Hash Algorithms

Read the following on a company’s website:

You provide *** two pieces of information: a “master password” – that one, single password you like – and the URL of the website requiring a password. Through the magic of one-way hash algorithms, *** calculates a message digest, also known as a digital fingerprint, which can be used as your password for the website. Although one-way hash algorithms have a number of interesting characteristics, the one capitalized by *** is that the resulting fingerprint (password) does “not reveal anything about the input that was used to generate it.” 1 In other words, if someone has one or more of your generated passwords, it is computationally infeasible for him to derive your master password or to calculate your other passwords. Computationally infeasible means even computers like this won’t help!

Can someone explain how a password can be stored in a file that is ‘computationally infeasible’ to decrypt? Or is this a bunk statement? Does this really make passwords that could not be ‘brute-forced’ regardless of the amount of time a computer is set to work on it (ie 1 day or 1 million days)?

You misunderstand a few things. The program Doesn’t create passwords that a remote cracker cannot get using brute force cracking methods. It only creates passwords that would be impossible to simply guess or to crack with a dictionary attack (using lists of commonly-used passwords or dictionary words). The point the part you quoted above is trying to make is that if a remote hacker/cracker/script kiddie does manage to intercept a few of your generated passwords, there is no algorithm which will allow him to discover your “master password” and therefore allow him to use this same program to gain access to your accounts. The weak link here, of course, is the “master password”. If I’m understanding it correctly, a given combination of master password and URL will always produce the same output, all else being equal. If your master password is easily guessed, then a cracker will have an easier time of it.

The purpose of a hash algorithm is to create a fixed-size digest or fingerprint of an arbitrary sized piece of information.

A very simple hash would just be a parity check: 1 if there are an odd number of 1 bits in the input, or 0 if there are an even number. If you put 1010001011 into this algorithm, you’d get a 1 out. If you put 0000000000000011000 in, you’d get a 0 out. Notice that the output doesn’t give you nearly enough information to reconstruct the input.

However, that’s a very weak hash algorithm, because it’s easy to come up with a collision - a different input that has the same hash value. A server usually hashes your password when you enter it for the first time, then when you try to log in again, it hashes the password you typed that time and compares it to the stored hash value. If the server used a weak hash algorithm like the one I described, it’d be easy for anyone who got access to the hash value to come up with a password they could use to log into your account. Cryptographic hash functions are designed to make it hard to find collisions.

Yes and no. If someone has access to the hash value, they can apply brute force to find a password that has the same hash value as yours. But in fact, they’ll find an infinite number of possibilities with that same hash value (if they have infinite time), and only one of them will be your master password.

It isn’t useful to regenerate your password: It is impossible (because there are an infinite number of bit patterns that will hash to a given value) and unneeded (because you password isn’t what’s stored in the database: the hash value is). What an attacker would need to do is find a bit pattern that hashes to the same value as your password. This is called generating a collision. Ideally, this should be extremely hard (which is what ‘computationally infeasible’ means).

A standard Unix system has a password file that works in this fashion: Nobody’s password is actually stored in /etc/passwd (the default password file). Instead, the value stored is the output of applying a hash function (commonly Blowfish, sometimes SHA or something else) to the password. Every time the a program needs to verify a password, it accepts a string, applies the same hash function, and checks the result against the value in /etc/passwd. If the values match, the password is accepted.

This has an upside and a downside: The upside is that nobody’s passwords are ever in a file that can be copied, the downside is that if the hash function is broken (that is, if anyone ever figures out an easy way to generate a collision) so is the system’s security.

(Not all hashing is done to protect passwords. Sometimes a hash value is generated to ‘tag’ complex data with a simple, unique integer that can be sorted and searched for more easily. All databases have a hash function of some form just for doing this.)

Collisions wouldn’t be much use in attacking this system. Presumably the attacker already has one of the hash values, so finding a way to recreate that doesn’t provide anything in this system the way it would in a Unix password system. Unlike a Unix password system where the server is doing the hash to compare the plain-text password supplied by the user to the stored value, in this case the client is doing the hash, so recreating the hash has no value.

For example, say a user of this system used their master password to create five site-specific passwords for different sites. An attacker intercepts one of those site-specific passwords and would like to use that to get the master password and recreate the other four passwords. Finding a value that hashes to the same site-specific password they already have is useless unless the collision they find happens to be the correct one (the combination of the master password and site URL).

The way to attack this system would be to use a dictionary attack to find the master password given one site-specific password and the corresponding URL. People in general use lousy passwords and people who use a shortcut system like this may be even more likely to use something short or easily guessed that would be vulnerable to a dictionary attack. This doesn’t require the attacker to “crack” the hash. A brute force attack might be computationally infeasible in theory, but in practice people rarely use good passwords and it’s likely the master password would fall to a targetted attack much more quickly than an exhaustive brute force of the key space. Of course, the point of this system is that people can use one really good hard-to-remember-or-attack password to generate all the other passwords they need, but how many people are going to take advantage of that and create a good master?

Just for the record, one of the most common hash algorithms out there, known as MD5, maps its input onto a 16-byte string, which means that there are 2[sup]128[/sup] possible outputs. Even if an attacker could generate one output per second, it would take them much longer to come up with a duplicate password than the universe would last.

But (again just FTR) MD5 is currently known to be partially broken, in that colliding pairs of messages can be produced fairly efficiently. In some cases (such as those where the document being signed will not be examined at byte level) this can be used to spoof signed messages.

But in any case, assuming that they company uses a quality hashing algorithm, the text is essentially correct. An eavesdropper that captures a transmission of the generated password will have no easy way to figure out what your master password is. In fact, for the right algorithm the eavesdropper can capture a very large number of transmissions with many different generated passwords and still not be able to figure out your master password before the heat death of the universe.

But that’s a pretty big “if”, all in all. If they have a good hasher then they should be telling you which algorithm they’re using. If they talk about a “proprietary” hashing algorithm, you should view them with skepticism. Good cryptography is very difficult – new algorithms by even experienced cryptographers often still have holes that aren’t found until years later after hard scrutiny of the cryptography community.

Also, their technology only protects against eavesdropping attacks. It doesn’t protect against man-in-the-middle attacks. Keyloggers, retroviruses, etc.

It doesn’t even really protect against eavesdropping (and it’s not supposed to). If a sniffer is monitoring my connection to Site1 and gets my password, this technology does nothing to prevent that or keep my password to Site1 safe. The sniffer can’t use that info about Site1 to get any info about my passwords at Site2, 3, and 4, but that’s no different than if I didn’t use this system and just had independent passwords. If you want to defend against eavesdropping (and man-in-the-middle attacks, if you verify certs) you use SSL.

This system is only designed as a convenient password safe so you can have one single login password (master password) that gets you into every website you visit. It’s much safer than just using the same password at every site where losing one would compromise them all, but it’s no safer than just using different passwords for different sites (as long as you choose good passwords).

I meant that it prevents an eavesdropping attack against your master password, not against your generated password.

The master password never crosses the wire, so it’s never even subject to an eavesdropping attack. Saying that the system protects against eavesdropping implies that it protects the data crossing the wire, which is not true. The password you use to access a site is as vulnerable to eavesdropping as it would be if you didn’t use the system. Saying that it protects the master password from eavesdropping is spurious.

No, that is the definition of an eavesdropping attack. The attacker sees the transmission and attempts to gain access to the original data. In this case, the attack would see the generated password and the URL of the connection, and would attempt to calculate the master password, or would attempt to determine what the generated password would be if the same master password were hashed with a different URL.

You’re contorting the definition of eavesdropping to fit your misstatement. Eavesdropping is listening in on someone else’s communication. In this context, it means using a packet sniffer to copy the traffic between the user and the website. If it succeeds, the attacker gets the data that made up the communication. If the attack fails, they don’t. In no way is the success of an eavesdropping attack predicated on the ability to take the intercepted data and infer other non-communicated data from it.

SSL protects against eavesdropping by encrypting the packet data. PGP can be used to encrypt email and protect against eavesdropping at a slightly different level than SSL. Some secure communication links attempt to prevent eavesdropping by hardening the channel (dedicated wires) rather than the content. The utility discussed in this thread does not address the eavesdropping attack in any way. If a sniffer grabs the generated password and URL which crosses the wire, the eavesdropping attack has succeeded. The attacker cannot (without great difficulty) use that data to obtain the master password, but that has nothing to do with eavesdropping. The eavesdropping attack has already succeeded at this point and the user’s password to the site is compromised.