Question about Passwords - Is XKCD right?

Northern_Piper · July 3, 2022, 9:46am

I’ve found this discussion very helpful. Thank you to all who have participated.

Question: what is the rainbow technique that some of you have mentioned?

Danger_Man · July 3, 2022, 11:47am

It is assumed that they already have a copy of the passord database, which should contain hashes of all the passwords. You then try passwords in quick succesion and run the hashing algorithm to see if they match.

Doing this at a login form is usually not feasible.

HMS_Irruncible · July 3, 2022, 11:48am

Not all systems have rate limits on password guesses, or the attacker might find a way to bypass the limit.

Francis_Vaughan · July 3, 2022, 11:52am

They don’t. The problem is if some company is compromised and their user database stolen. Full of encrypted passwords. So the attack is done against the user database at the leisure of the attacker. Which is why rainbow tables are so powerful. You don’t go trying passwords, you pre-calculate a massive list of password encryptions and go looking for matches to the encryption. Adding a salt (a known bit of extra random but known information to the password encryption makes rainbow tables harder, as each valid encryption can come in a few thousand variants, but still potentially viable.)
The trick then is to populate the rainbow table. There is a known set of very common trivial passwords (like “password”, and so on) and modern compute is powerful enough that the entire dictionary of words can be added, and common variants of those words (such as common substitutions 4 for A 3 for E, and so on). It just escalates. The criticism of the XKCD method is that once it becomes known and common, it becomes worth adding to rainbow table generation, and with a limited number of common words, its intrinsic entropy is actually not that good.

The real danger of these attacks is that many people use the same password, or a simple variant, on other sites. So a compromise of some tiny web serve - say an obscure forum - may lead to attackers discovering passwords to much more important sites.

DPRK · July 3, 2022, 12:15pm

Which is why it is absolutely recommended never to use a password scheme in which the server stores a list of H(p) where p is the user’s password and H is a deterministic function. If it did, then by pre-computing H(p) for all passwords in the dictionary the attacker can instantly find the password by compromising the server.

As for generating passwords, the Diceware guy seems to be now recommending no fewer than six words, so 6 × 12.9 = 77.5 bits of entropy, if you use this scheme, not “4 common words”.

Chronos · July 3, 2022, 12:26pm

No, no, no. This is called “security by obscurity”, and it doesn’t work. Take Enigma, for instance: In a war, it’s pretty much guaranteed that at some point the enemy is going to be able to eventually capture one of your encryption/decryption machines. Maybe a position is captured too quickly for the scuttling procedures, maybe someone defects, maybe a plane crash-lands, whatever, but the enemy is going to be able to get their hands on it eventually. Which is why you need something that’s secure even if your enemy knows all the details of your method. Which was the design intent for Enigma, and to be fair, given the technology of the time, it was almost good enough.

Its entropy was calculated based on the assumption that the technique was already known. Just as it should be, because you always assume that the technique is known and build security based on that assumption.

puzzlegal · July 3, 2022, 12:44pm

The issue is that it increases the difficulty for humans enormously more than it increases the difficulty for computer algorithms. Passwords are problematic in that they aren’t something the human brain is very good at.

In addition to what Chronos said, I’m not worried that my son, or you, for that matter, will hack into my accounts. In fact, if i unexpectedly drop dead, it would be good if my son succeeded in hacking into my accounts. What I’m trying to protect against is some software run by people I’ve never met hacking into my accounts.

FinsToTheLeft · July 3, 2022, 12:51pm

For your average Joe on the street, this may be the case. For state actors or Ransomware criminals, it is more likely to be a targeted attack where they are going to look at LinkedIn, Twitter, and Facebook of a high value individual or company.

Francis_Vaughan · July 3, 2022, 12:53pm

Quite so.

44 bits seems a reasonable value. Randall has a good example of how even the ten one hundred most common words can be used for more than one thinks. So 40 bits of entropy is probably a low bid.

https://xkcd.com/thing-explainer/

Francis_Vaughan · July 3, 2022, 12:58pm

Exactly. Social engineering is by far and away the most successful attack vector. It is the most likely manner anyone would come into possession of a password database. The two may go hand in hand, depending upon the particular bent of the criminals involved. The dark web apparently has markets for cracked passwords. But if you are into high stakes attacks, this is small beans.

puzzlegal · July 3, 2022, 1:04pm

Lots of password attacks are on random computers, not on the account of a known a high-value person.

DPRK · July 3, 2022, 1:19pm

The “dark web” sells all kinds of things, but if you want to check whether your non-randomly-generated password is on some list of common passwords, every security firm publishes such lists:

123456
123456789
12345
qwerty
password
iloveyou
111111
sunshine
princess
[...]

Francis_Vaughan · July 3, 2022, 1:28pm

Yeah. What is fun is how these lists changed over the years. Back in the day “Gandalf” was one of the most popular. Username and username backwards also rated pretty high. We used to run a password cracker against the user base using one of these lists to flush out idiot users as a background job. At one point even ran it with the basic spell checker dictionary as well. But that was so long ago that it would takes weeks to complete.

echoreply · July 3, 2022, 2:11pm

I’ve got /var/log/auth files full of counter examples

saslauthd[843]: : auth failure: [user=admin] [service=smtp] [realm=] [mech=pam] [reason=PAM auth error]
sshd[1943225]: Failed password for root from 128.199.193.246 port 50302 ssh2
sshd[1943262]: Invalid user webuser from 14.52.249.27 port 44354

which is just pasted from the last few lines of the file.

The speed at which these attempts can be made on any one server is very limited, and in many cases, after a few bad guesses the server will ban offending IP address. The attackers have adapted by only making a few tries at a time on any single server, then coming back to that server after it’s cooled down. There are plenty of other servers to try in the meantime.

The attackers aren’t trying to guess all 2^{44} or even 2^{28} passwords, they’re just using a list of known passwords from previous hacks, which is why it is vitally important to use different passwords on different sites.

Francis_Vaughan · July 3, 2022, 2:15pm

Yeah, I see the same. I didn’t mean nobody was attacking logins. I meant nobody was attacking logins with the full combinatorial set of possible passwords. That was the context of my statement. Once attackers have a list of good passwords attacks on logins are worthwhile.

LSLGuy · July 3, 2022, 2:41pm

This was semi-explained by better folks above, but to put it into non-technical terms, and ignoring some details along the way…

Done the sensible modern way, when you set a password, the computer at the other end does not store your password in the form of the actual letters you typed in. Instead, it computes a number from your password, and stores that number.

The number is called a “hash” of your password and the idea is that by choosing the right computation (which literally dices and slices your text like a kitchen hash) , it’s trivial to start with the password text and compute the number from it, but it’s (practically) impossible to go the other way: to start with the number and compute the password text from it.

So if your password is “qwerty” the number might be 32,452,654,723 in ordinary decimal notation. And if mine is “Qwerty” the number might be 87,345. The next time you or I try to log in, we enter the password, the server computes the corresponding number, then looks up your or my user record in their data base and compares the newly computed number with the one on file. Match = let you in, no-match = you must have typo-ed or it’s not really you.

That’s the background. So sitting on the server(s) behind each website and each corporate or government computer system is a database that amounts to a long list of usernames and the corresponding password hashes. Now suppose somebody steals a copy of that database from some server someplace and takes it back to their secret lair inside a dormant volcano crater. How can they get the passwords back out when the hashing process is one-way by design?

The answer is they don’t exactly. What they can do instead is create a big list of possible passwords and the corresponding hashes for each. This can be done just once and can be reused for years in hundreds of attacks against hundreds of stolen databases. This list is called a “rainbow table”.

So then they look through the stolen user database trying to find users whose password hash matches one in their rainbow table. When they get a hash match, they then know what the actual password must have been to create that hash. the answer is right there in their rainbow table. Ka-Ching! Now they have the knowledge to go to that site or internal corporate/government system and log on as that person. They know both the username (from the stolen user database) and the password (from the rainbow table).

This is where password complexity starts to matter. On an old mainframe computer, passwords may have been limited to upper-case A-Z and exactly 6 characters. By modern standards that’s a tiny number of possible passwords, roughly 300 million, so the rainbow table would be tiny too.

By allowing the passwords to be either longer or include more possible characters, the number of possible passwords gets bigger. Lots bigger. And makes rainbow tables a bit less of a slam dunk for the attacker.

Up around 15 to 20 characters of password length this method starts getting intractable. For now.

And yes, experts, I’m leaving out a LOT of details here.

DPRK · July 3, 2022, 3:03pm

This is why the server is not supposed to store a (deterministic) hash of the password. Instead, during password registration an “oblivious” pseudo-random function or some such trick is used. I don’t know what is the current recommended standard.

md-2000 · July 3, 2022, 3:08pm

It’s a step in the right direction, just like “don’t use a word in the dictionary”. The point being, even in this thread there have been multiple techniques suggested for how to add randomness to your password. Each adds an order of magnitude to the hacker’s effort, while that technique while remembering one or two of these techniques is easier for the person to remember.

Start combining and the complexity gets worse. “Now Is The TIme…” becomes nitTim3 or N1i2T3time or 0Now9I8T7time - if the hacker does not know which technique you use, they have to try even more possibilities; if they stole a database of hashed passwords and everyone has their own method their task is worse and the likelihood yours will be the one compromised is even lower.

If they have a large database of passwords to crack, odds are some user in there used a very insecure method. So, “weakest link” comes to mind. I saw a security report back in the day, and between blank passwords and passwords that were the userid (perhaps with one digit) they found several users. This gave them remote access to the server, and from it to the PC’s connected to the server where the ordinary user was admin, so as local PC admin they could download the credentials tokens stored from when people had logged into the PC’s. From this they got the admin hashed password, and within a short while (Hours? Days?) they had cracked a poorly crafted administrator password. Of course, many of these issues have been addressed by Microsoft and good security practices by now.

The flaw in the “good old days” - systems left RDP port access open on the firewall with the assumption that Microsoft would prevent invalid logins. To bypass the maximum attempts, hackers would disconnect and reconnect every few tries, also cycle through source IPs. After all, you can’t lock out everyone who needs remote access. (And significantly, “administrator” could not be locked.) Today, smart routers, VPN’s and second-factor authentication have made these sort of attacks much harder. Most hacks come through email viruses.

Northern_Piper · July 3, 2022, 3:13pm

This was very helpful. Thank you.

DesertDog · July 3, 2022, 3:28pm

The words should be random, though. correcthorsebatterystaple is a pretty good password – they’re random and create a mental image. rollingdeepbluesea not so much – the image is there but the random is not.

Topic		Replies	Views
xkcd #936, Aug. 10, 2011 Cafe Society	94	15568	September 5, 2011
Password Strength Factual Questions	46	6905	August 30, 2011
Password nonsense In My Humble Opinion	42	3297	December 2, 2010
Gawddamn Rules The BBQ Pit	45	6647	January 20, 2011
Maximum password length About This Message Board	13	1097	August 23, 2001

Question about Passwords - Is XKCD right?

Related topics