Question about Passwords - Is XKCD right?

Yes. That is what I said. It is stored along with the hashed password. So it is visible to a hacker. But that doesn’t diminish its usefulness.

At one CS department I worked for I would periodically run a password dictionary+basic brute force program against the hashed password file. (Solaris shadow password file.) Always got lots of results.

The easiest one to crack was always a computer center admin who picked simple words like “spring”. I’d go tell her what her password was and her response was always: 1. “How do you know that?” 2. After explaining the deal “Huh?” No idea what I was talking about. A network admin. Jeesh. Just didn’t understand the concept of “Don’t pick a word in a dictionary!”

Avoiding storing hashed passwords on the server requires some fancy crypto. That may not be feasible for console logins, but seems like it should be doable for platforms like Discourse.

Basically, they somehow hack and get the website’s hashed password storage file. Which I gather is something like
John@gmail.com, hash is 92c&E74239
Mary@yahoo.com, hash is 9
#^)c87v
and so on.

Then, they try a brute force attack basically trying all combinations, and run them through usual algorithms used to encrypt passwords.

So if aaaaa hashes to 3^c8d6, and aaaab hashes to &6c7123, and so on, they keep going until they get to something that hashes to target password.

If they find that mackdonnashoehornbutterhorse hashes to 92*c&E74239, then now they know what John’s password is, and they can log into the website using that password.

If the website lets you input a password of 8 characters, they’re going to hit on the right combination much sooner than if it requires 12 or more.

Exactly.

Other than the small detail that hashes are simply integer numbers, not keyboard gibberish. So this instead:

Sort of. Hash values are fixed length, so you wouldn’t have a random number of digits like you show.

They are also just bits, and while they could be rendered as base-10 integers, it is much more common to see them expressed as hex values.

No, it’s only going to be dozens of times harder. “Dozens of times” isn’t very much: You can make any method dozens of times harder just by appending a single random letter to the end. Length and randomness are king, as far as making passwords harder.

Even if everyone used the XKCD method, and every hacker knew that everyone used it, and everyone used words only from the 5000 most common words, then you could make the method 5000 times stronger just by using five words instead of four. That beats out “dozens of times”, easily.

Granted they’re internally a 64 or 128 or whatever bit integer. Which IT people will naturally express in hex. Agree 100% w you. But …

We’re explaining all this to IT laymen. Or at least that’s the audience I’m targeting. “861” is IMO more clear to that audience than is
0X0000000000000000’000000000000035D.

YMMV of course.

Thank you! much appreciated. “IT laymen” is actually being kind.

As others have said, pretty much. Except, instead of just starting with aaaaa and then trying aaaab, they’re going to start with some better guesses. First might be to cycle through information from the password file itself, so try guessing john111 or m4ry and so on.

Then move onto other guesses, such as a downloadable list of stolen passwords. Because passwords are frequently reused, this will both find things that are super weak, like password and also things that can be very strong, like 3wNP0Pb|, but were leaked from a plain text password list at some point in the past. This is why it’s important to not reuse passwords.

Then move on to a list made up of 500,000 dictionary words, or 2 million. Add in some rules to do l33t substitution, put numbers and special characters on, and such. At this point the cracker could thrown in phrases, random combinations of words, etc. I don’t know if any of them do that.

If the password storage system is weak with a small salt (a random string added to the password), then rainbow tables can be used, so all of the guesses above can be pre-calculated, and stored for multiple uses. It’s cheaper to store a few terabytes of hash values, than to compute those values multiple times.

And all of this can happen in parallel depending on how many CPUs or GPUs the cracker has access to.

At this point, the cracker might have gotten a bunch of passwords, so just stops trying to crack them, and sells the accounts, or whatever. If the cracker is trying to break into a particular account, then brute forcing might start.

If the cracker has to use brute force, then there is a good chance the password will remain secure, unless it’s only based on 7 characters or something trivial to break. If all the cracker knows is that the password could be between 8 and 24 characters, and could contain letters (big and small), numbers, special characters, and isn’t any of the few million guesses from above, there really isn’t a point to even trying to brute force it.

Well, if that’s what you know, then you might still try brute force, just in case “8 to 24 characters” equals 8. Which it might. But if you’re trying that and the user actually used 23 characters, then yeah, I hope the cracker has a backup plan that doesn’t require breaking the password.

There is a standard algorithm built into assorted operating systems - “how to hash a password”.

If you’ve somehow gotten the database of hashed passwords the you have in your spare time while busy breaking into systems, used the standard algorithm for Windows or Linux to encode all the million or two plaintext guesses you have listed in your “most likely passwords” list. This produces a “Rainbow table” of a million or more “text vs hash result”.

As I mentioned, one trick I saw was pretty simple. The security company figured out one password to logon to the Windows server. (believe it or not, until passwords were enforced a few years ago, some system passwords were blank, some users had they userid as password; or “password”, or 12345678, etc. ) Once logged into the server remote, they then could view the disk contents of assorted networked PC’s, since for various reasons, many times “Everyone” was administrator on the local PC - simplified when people used a different PC than their own, allow installing new software and print drivers, etc.

When you logon to a Windows PC, it stores a hashed token on the PC - this means that you can logon again using your Windows domain ID without the PC having to check the password with the server. (I.e. during network down, or if travelling with a laptop). This token is good for 30 days.

They downloaded this list of tokens, and compared the hashed values against the table of hashes. Quite often in tech support situations, the domain administrator will have logged onto the PC to provide support. So now, they have a copy of the domain admin hashed password - and so maybe the keys to the kingdom.

There are a number of measures to prevent this.

  • obviously, enforce non-trivial passwords. This is almost automatic in Windows domains now.
  • Disable user enumeration - don’t allow hackers to list off all user names by providing the number associated.
  • Don’t allow regular users to logon to the servers.
  • For remote desktop access, require validated VPN. Require 2-factor authentication.
  • have a separate userid for tech support on PC’s that has no permissions on the server itself.
  • Use a different user that “Administrator” as the domain admin.

There’s a number of other obvious controls -
If it’s a technical userid (i.e. runs the doorway card entry system) don’t allow it to logon interactively as a regular user. Enforce limits of access rights. disable userIDs when the person leaves. (None of this “Fred will logon as Martha occasionally to check her email for the next few months” - have a policy to handle ex-users)

Here’s a less obvious one - disable network ports not in use. Use router policies to disallow internet access unless a device is specifically allowed. A computer today can be the size of a USB thumb drive. If it has an ethernet jack, and runs by PoE like many business phones, a visitor could plug one of these into an open network plug (or if WiFI is available, use a simple power plug) and now have their own remote-accessible device on the network using the network’s internet connection.

I’m confused. I remember seeing password hashes in the /pwd directory back in the 90s, and they did look just like ASCII gibberish. (I remember password hashes being public back then. Or perhaps our university admin goofed. The hashes were MD5 maybe?) Sure, it’s all numbers in the end, but that’s how it was stored in the file. Or are we talking about different things?

If these were stored in a file, i.e. in text format, I can see using ASCII “gibberish” back in the old days when storage was at more of a premium.

If I render a number in hex, each character (0-F) represents 4 bits of information, but I’m using 8 bits to save it (assuming ASCII, not Unicode). So half the space is wasted. If I use every bit, it will look like gibberish when rendered in ASCII.

Back in the early days of Unix systems the file /etc/passwd contained the user account information. The password field did indeed contain the hashed password in plain view. The passwd file needs to be readable by all users for all sorts of mundane tasks. It didn’t take long before this mistake was fixed by creating the /etc/shadow file to contain the password hash, and replacing the password field in /etc/passwd with a “*”. The shadow file is protected and only accessible by root. I first saw this with Sun systems in the early 80’s. I don’t now if it was a Sun idea, but it was ubiquitous pretty quickly.

The early password hash was always a subset of printables, not hex. There were a few characters that didn’t make it into the hash (I think) but I don’t remember which they were. Two of the characters were the salt rather than the hash proper.

I suspect we’re talking at different levels of abstraction.

I know bupkiss about *nix except how to spell it. So how password hash values 30 years ago were calculated, encoded for disk storage, or decoded for console display is a totally mystery to me.

The then-current but now-deprecated MD5 standard hash function produces a 128 bit integer as the output. That much is certain.

After that it’s all implementation details. Maybe they base-64 encoded the values. Maybe they had some other “homebrew” text encoding they used. Which may well be different on various flavors of *nix not to mention versions. Not something I happen to know. I’m not intending to dispute your recollection of your experience.

Yes – all that and the explanation for why it was visible is sounding very familiar to me. That’s also when I learned of rainbow tables and getting these hashed passwords to run against the table to find matches (I never went so far as to try that.) And I do remember at some point the password field being obfuscated. There were a lot of odd things you could do back in those early or middle-aged days. I remember you could “finger” someone, figure out what terminal or whatever they were on, and output messages (and even ASCII animations) over to their screen. So you could do something like: banner “THE ILLUMINATE ARE WATCHING” >! /directory/tty34 (or whatever the structure was – this is getting close to 30 years ago) and the message would appear on their screen, in the middle of whatever they were doing. Similarly you could cat (I think it was) ASCII animations that would clear their screen and show, say, a flying saucer going across their screen or whatnot. Security structure was pretty loose back then.

/dev/pty34 would have been the psuedo-terminal device number 34.
Writing anything to this would appear directly on their screen. Access like that was closed down pretty quickly.
The finger command was a great source of security holes.

These were the days when you knew everyone on the system personally and there was clear trust. Actually abusing that trust was just not on. How things change.

This shows you an example of an unshadowed password:

https://www.cyberciti.biz/faq/understanding-etcshadow-file/

The second field is the hashed password. And, yes, as stated above, they did not remain unshadowed for long. Pretty sure by 1994 they had obfuscated it.

Thanks. This is bringing back fun memories of exploring UNIX and computers for someone whose only computer at that time had been a C64.